This material is sponsored and created with the sponsorship and support of IBSurgeon www.ib-aid.com, vendor of HQbird (advanced distribution of Firebird) and supplier of performance optimization, migration and technical support services for Firebird.
The material is licensed under Public Documentation License https://www.firebirdsql.org/file/documentation/html/en/licenses/pdl/public-documentation-license.html
- Detailed New Features of Firebird 5. Part 1: Improvements in Optimizer
- Detailed New Features of Firebird 5. Part 2: SQLs
- Detailed New Features of Firebird 5, Part 3: SKIP LOCK
- Detailed New Features of Firebird 5, Part 4: Parallel Features: backup/restore/sweep, creation of indices
- Detailed New Features of Firebird 5, Part 5: Better compression, cache of compiled statements, and more
- Detailed New Features of Firebird 5, Part 6, SQL Profiling - coming soon
- Parallel Reading Of Data in Firebird
1. Parallel execution of maintenance tasks
Since version 5.0 Firebird can perform main maintenance tasks using multiple threads in parallel. Some of these tasks uses parallelism at the Firebird kernel level, others are implemented directly in the Firebird tool. Currently, only the parallel execution of sweep and index creation tasks is implemented at the kernel level. Parallel execution is supported for both automatic and manual sweeps.
In future versions it is planned to add parallelism when executing SQL queries.
1.1. Parallel execution of tasks in the Firebird kernel
To handle a task with multiple threads, the Firebird engine launches additional worker threads and creates internal worker connections. By default, parallel execution is disabled. There are two ways to enable parallel features for the connection:
Set the number of parallel workers in DPB using the
Set the default number of parallel worker processes using the
Some utilities (
gbak) supplied with Firebird have a command line option
-parallel to set the number of parallel worker processes. Often this switch will simply pass the number of workers through the
isc_dpb_parallel_workers tag when connecting to the database. The new
ParallelWorkers parameter in
firebird.conf sets the default number of parallel worker processes that can be used by any user connection running a parallelizable task. The default value is 1 and means no additional parallel worker processes are used. The value in DPB takes precedence over the value in
To control the number of additional workers the engine can create, there are two new settings in
Sets the default number of parallel worker processes used by user connections. Can be overridden for a connection by using the
isc_dpb_parallel_workerstag in the DPB.
Limits the maximum number of concurrent worker processes for a given database and Firebird process.
Internal workers are created and managed by the Firebird engine itself. The engine maintains worker connection pools for each database. The number of threads in each pool is limited by the value of the
MaxParallelWorkers parameter. Pools are created independently by each Firebird process. In the SuperServer architecture, worker connections are implemented as lightweight system connections, while in Classic and SuperClassic they look like regular user connections. All workers connections are built into the server creation process. Thus, in Сlassic architectures there are no additional server processes. Worker connections are present in the monitoring tables. Dead working connections are destroyed after 60 seconds of inactivity. Additionally, in classic architectures, worker connections are destroyed immediately after the last user connection is disconnected from the database.
1.1.1. Practical recommendations for parameters
What are practical recommendations? Since this feature came from HQbrid (vv2.5-4) where it was used during several years on hundreds of servers, we can use its experience for Firebird 5.0. For SuperServer it is recommended to set parameter
MaxParallelWorkers to 64, and
ParallelWorkers to 2. For maintenance tasks (backup/restore/sweep) it is better to set individually with switch
-parallel, with the value equal to the half of the number of physical cores of your processor (or all processors).
For Classic architecture,
MaxParallelWorkers should be set to a number smaller than the number of physical cores of your processor it is important since the
MaxParallelWorkers limit is set on each process.
It all depends on whether it is done under load or not. If I restore the database, then at that moment no one else is working with it. I can ask for the maximum. But if you are doing a backup/sweep/creating an index under load, then you need to moderate your appetites.
Let’s look at how parallelism affects the execution time of building or rebuilding an index. The effect of parallelism on automatic sweep will not be shown, since it starts automatically without our participation. Impact of concurrency on manual sweep will be demonstrated when examining how Firebird tools perform maintenance tasks.
1.1.2. Multi-threaded index creation or rebuild
Let’s compare the speed of creating an index for the
WORD_DICTIONARY table for different ParallelWorkers values, containing 4079052 records.
For the purity of the experiment, before the new test, we restart the Firebird service. In addition, in order for the table to be in the page cache, we run the following query:
SELECT COUNT(*) FROM WORD_DICTIONARY;
The query to create an index looks like this:
CREATE INDEX IDX_WORD_DICTIONARY_NAME ON WORD_DICTIONARY (NAME);
The execution statistics for this query with
ParallelWorkers = 1 are as follows:
Current memory = 2577810256 Delta memory = 310720 Max memory = 3930465024 Elapsed time = 6.798 sec Buffers = 153600 Reads = 11 Writes = 2273 Fetches = 4347093
Now let’s delete this index, set
ParallelWorkers = 4 and
MaxParallelWorkers = 4 in the config and restart the server. Statistics for running the same query look like this:
Current memory = 2580355968 Delta memory = 2856432 Max memory = 4157427072 Elapsed time = 3.175 sec Buffers = 153600 Reads = 11 Writes = 2277 Fetches = 4142838
As you can see, the index creation time has decreased by a little over 2 times.
The same thing happens when rebuilding the index with the query:
ALTER INDEX IDX_WORD_DICTIONARY_NAME ACTIVE;
1.2. Parallel execution of maintenance tasks by Firebird tools
Main utilities (
gbak) supplied with Firebird also support parallel task execution. They use the number of parallel worker processes set in the
ParallelWorkers parameter in
firebird.conf. The number of parallel worker processes can be overridden using the
-parallel command line switch.
It is recommended to always set the number of parallel processes explicitly using the
The following tasks can use parallel execution:
Creating a backup using the
Restoring from a backup using the
Manual sweep using the
Updating icu using the
1.2.1. Parallelism when performing backups using the
Let’s see how parallel execution will affect backup’s speed of
We will use the fastest backup option - through the service manager and with garbage collection disabled. In order to be able to track the time of each operation during the backup, we will add the
-stat td switch.
First, let’s run the backup without parallelism:
gbak -b -g -par 1 "c:\fbdata\db.fdb" "d:\fbdata\db.fbk" -se localhost/3055:service_mgr -user SYSDBA -pas masterkey -stat td -v -Y "d:\fbdata\5.0\backup.log"
The backup completed in 35.810 seconds.
Now let’s try to run a backup using 4 threads (on the computer which has 8 cores).
gbak -b -g -par 4 "c:\fbdata\db.fdb" "d:\fbdata\db.fbk" -se localhost/3055:service_mgr -user SYSDBA -pas masterkey -stat td -v -Y "d:\fbdata\5.0\backup-4.log"
The backup completed in 18.267 seconds!
As you can see, as the number of parallel processors increases, the backup speed increases, although not linearly.
In fact, the effect of parallel threads on backup speed depends on your hardware. The optimal number of parallel threads should be selected experimentally.
Any additional switches can also change the picture. For example, the
1.2.2. Parallelism when performing restore using the
Now let’s look at how parallelism affects the speed of restoring from a backup. Restoring from a backup consists of the following steps:
creating a database with the corresponding ODS;
restoring metadata from a backup copy;
inserting data to user tables;
Parallelism will only be involved in the last two stages.
In order to be able to track the time of each operation during restoration from a backup, we will add the
-stat td switch.
First, let’s start restoring from a backup without parallelism:
gbak -c -par 1 "d:\fbdata\db.fbk" "c:\fbdata\db.fdb" -se localhost/3055:service_mgr -user SYSDBA -pas masterkey -v -stat td -Y "d:\fbdata\restore.log"
Restore from backup completed in 201.590 seconds. Of these, 70.73 seconds were spent on restoring table data and 121.142 seconds on building indexes.
Now let’s try to start restoring from a backup using 4 threads.
gbak -c -par 4 "d:\fbdata\db.fbk" "c:\fbdata\db.fdb" -se localhost/3055:service_mgr -user SYSDBA -pas masterkey -v -stat td -Y "d:\fbdata\restore-4.log"
Restore from backup completed in 116.718 seconds. Of these, 26.748 seconds were spent on restoring table data and 86.075 seconds on building indexes.
With the help of 4 parallel workers, we were able to almost double the recovery speed. At the same time, the speed of data recovery has increased by almost 3 times, and the construction of indexes has accelerated by 1.5 times.
Why? The explanation is simple: parallelism is used only when engine builds large indexes. Many tables in the database taken as an example are small, the indexes on them are small too, and the number of such tables is large. Therefore, the numbers in your database may be different.
Note that the MaxParallelWorkers parameter limits the use of parallel threads to the Firebird kernel only. When restoring a database using the gbak utility, you can observe the following picture: data in tables is restored quickly (parallelism is noticeable), and building indexes is slower. The point is that indexes are always built by the Firebird kernel. And if MaxParallelWorkers has a value less than that specified in -parallel, then only MaxParallelWorkers of threads will be used to build indexes. However, gbak inserts data to the tables, using -parallel worker threads.
1.2.3. Parallel manual sweep using the
Sweep (cleaning) is the important maintenance process: Firebird scans specified database, and if there are “garbage” records versions, remove them from data pages and from indices. By default, Firebird database is created with autosweep setting, but for the medium and large databases (30+Gb) with high number of transactions per second it could be necessary to disable automatic sweep use manual (usually, scheduled) sweep instead.
Before Firebird 3.0 sweep always scanned all data pages. However, starting with Firebird 3.0 (ODS 12.0), data pages (DP) and pointer pages to data pages (PP) have a special swept flag that is set to 1 if sweep has already scanned the data page and cleared garbage from it. When records in this table are modified for the first time, the flag is reset to 0 again. Starting with Firebird 3.0, automatic and manual sweep skips pages that have the swept flag set to 1. Therefore, a repeated sweep will go much faster, unless, of course, since the previous sweep you have not managed to change records on all pages of the database data. New data pages are always created with swept flag = 0. When restoring the database and backup, all DP and PP pages will be with swept flag = 0.
How to test correctly? An idle sweep after restoring from the database did not show any difference in single-threaded and multi-threaded mode. Therefore, I first ran a sweep on the restored database so that the next sweep would not check uncluttered pages, and then I made a request like this:
update bigtable set field=field;
The purpose of this request was to create 'garbage' in the database. Now you can run the sweep to test its execution speed.
First, let’s run sweep without parallelism:
gfix -user SYSDBA -password masterkey -sweep -par 1 inet://localhost:3055/mydb
DESKTOP-E3INAFT Sun Oct 22 16:24:21 2023 Sweep is started by SYSDBA Database "mydb" OIT 694, OAT 695, OST 695, Next 696 DESKTOP-E3INAFT Sun Oct 22 16:24:42 2023 Sweep is finished Database "mydb" 1 workers, time 20.642 sec OIT 696, OAT 695, OST 695, Next 697
Now we will update the large table and rollback again, and run a sweep with 4 parallel workers.
gfix -user SYSDBA -password masterkey -sweep -par 4 inet://localhost:3055/mydb
DESKTOP-E3INAFT Sun Oct 22 16:26:56 2023 Sweep is started by SYSDBA Database "mydb" OIT 697, OAT 698, OST 698, Next 699 DESKTOP-E3INAFT Sun Oct 22 16:27:06 2023 Sweep is finished Database "mydb" 4 workers, time 9.406 sec OIT 699, OAT 702, OST 701, Next 703
As you can see, the speed of sweep execution has increased more than 2 times.
1.2.4. Parallel icu update using the
-icu switch allows you to rebuild the indexes in the database using the new ICU.
The ICU library is used by Firebird to support COLLATION for multibyte encodings like UTF8. On Windows, ICU is always bundled with Firebird. In Linux, ICU is usually a system library and depends on Linux version. When moving a database file from one Linux distribution to another, the ICU installed on the system may have a different version. This may result in a database on an OS running a different version of ICU being binary incompatible for indexes character data types.
Since rebuilding indexes can be done using parallelism, this is also supported for
In this part we have considered parallel features of Firebird 5 tools. If you would like to know more how gbak implements the parallel reading of data, or even implement the similar mechanism in your own application, read the in-depth article "Parallel Data Reading in Firebird".
There are 2 more parts, stay tuned!