IBAnalyst: How to get statistics from InterBase/Firebird database in right way
IBAnalyst: How to get statistics from InterBase/Firebird database in right way
Dmitry Kuzmenko, last update 31-03-2014
Abstract
This document is devoted to tips and tricks of gathering and analyzing statistics from InterBase/Firebird databases with or without IBAnalyst.
Right time, right place
It sounds strange, but just taking statistics via gstat or Services API is not enough. Statistics must be taken at the right moment to show how applications affect data and transactions in database. Worst time to take statistics is
Right after restore
After backup (gbak –b db.gdb) without –g switch is made
After manual sweep (gfix –sweep)
It is also correct that during work there can be moments where database is in correct state, for example, when applications make less database load than usual (users at launch, dinner or its by specific business process times).
How to catch when there is something wrong in database?
Yes, your applications can be designed so perfect that they will always work with transactions and data correctly, not making sweep gaps, lot of active transactions, long running snapshots and so on. Usually it does not happen. At least because some developers test their applications running 2-3 simultaneous users at the same time, not more. Thus, when they set up written applications for 15 and more simultaneous users, database can behave unpredictably. Of course, multiuser mode can work Ok, because most of multiuser conflicts can be tested with 2-3 concurrently running applications. But, next, when more concurrent applications will run, garbage collection problems can came (at least). And this can be caught if you take statistics at the correct moments.
If you does not experience periodical performance problems
This can happen when your applications are designed correctly, there is low database load, or your hardware is modern and very powerful (enough to handle well current user count and data).
The most valuable information is transactions load and version accumulation. This can be seen only if you setup regular statistics saving.
InterBase does not have internal task scheduler, so you are free to use any external, like standard Task Scheduler (Windows) or cron (Unix).
The best setup is to get hourly transaction statistics. This can be done by running
gstat –h db.gdb >db_stat_
If you experience periodical performance problems
These problems usually caused by automatic sweep run. First you need to determine time period between such a performance hits. Next, divide this interval minimally to 4 (8, 16 and so on). Now information systems have lot of concurrent users, and most of performance problems with not configured server and database happens 2 or 3 timers per day. For example, if performance hits happens each 3 hours, you need to take
gstat –h db.gdb
statictics each 30-45 minutes, and
gstat –a –r db.gdb –user SYSDBA –pass masterkey
each 1-1.5 hour.
The best is when you take gstat –a –r statistics right before forthcoming performance hit. It will show where real garbage is and how many obsolete record versions accumulated.
What to do with this statistics
If your application explicitly uses transactions and uses them well, i.e. you know what is read read_committed and when to use it, your snapshot transactions lasts no longer than needed, and transactions are being active minimal duration of time, you can tune sweep interval or set it off, and then only care about how many updates application(s) makes and what tables need to be less updated or cared about updates.
What does this mean, you can ask? We'll give example of some system, where performance problems happened each morning for 20-30 minutes. That was very sufficient for "morning" applications, and could not last longer.
Database admin was asked correct questions, and here is the picture:
Daily work was divided by sections – analytics works in the morning, than data is inserted and edited by usual operators, and at the end of the day special procedures started gathering data, that would be used for analytics next day (at least).
The last work on database at the end of day was lot of updates, and updates of those tables which analytics used in the morning. So, there were a lot of garbage versions, which started to be collected by application, running in the morning.
And, the answer to that problem was found simple – to run gfix –sweep at the end of the day.
Sweep reads all tables in database and tries to collect all garbage versions for committed and rolled back transactions. After sweeping database became clear nearly it comes after restore.
And, "morning problem" has gone.
So, you need to consider statistics with lot of other factors:
1. how many concurrent users (average) work during the day
2. how long is working day (8, 12, 16, 24 hours)
3. what kind of applications running at different day times, and how they affect data being used by other applications, running at the same time or next. I.e. you must understand business processes happening during the whole day and whole week.
When DBA can't do nothing
Sadly to say, these situations happen. And again, example:
Some system installed for ~15 users. Periodically performance is so bad, that DBA needs to restart server. After server restart everything work fine for some time, then performance gets bad again. Statistics showed that average daily transactions is about 75,000, and there are active transactions running from the start of day to the moment when performance getting down.
Unfortunately, applications were written with BDE and with no transactions using at all; i.e. all transaction handling was automatic and used by BDE itself. This caused some transactions to stay active for a long time, and garbage (record versions) accumulated until DBA restarted server. After restart automatic sweep run, and garbage became collected (eliminated).
All these was caused by applications, because they were tested only with 2-3 concurrent users, and when they became ~15, applications started to make very high load.
Need to say that in that configuration 70% of users were only reading data, and other 30% were inserting and updating some (!) data.
In this situation the only thing that can make performance better is to redesign applications completely.