This material is sponsored and created with the sponsorship and support of IBSurgeon www.ib-aid.com, vendor of HQbird (advanced distribution of Firebird) and supplier of performance optimization, migration and technical support services for Firebird.
The material is licensed under Public Documentation License https://www.firebirdsql.org/file/documentation/html/en/licenses/pdl/public-documentation-license.html
All parts:
- Detailed New Features of Firebird 5. Part 1: Improvements in Optimizer
- Detailed New Features of Firebird 5. Part 2: SQLs
- Detailed New Features of Firebird 5, Part 3: SKIP LOCK
- Detailed New Features of Firebird 5, Part 4: Parallel Features: backup/restore/sweep, creation of indices
- Detailed New Features of Firebird 5, Part 5: Better compression, cache of compiled statements, and more
- Detailed New Features of Firebird 5, Part 6, SQL Profiling
- Parallel Reading Of Data in Firebird
In this part, we will look at data compression, prepared queries cache, and other, no less important features and improvements in Firebird 5.0.
1. New ODS and upgrade without backup-restore
The traditional way of updating ODS (On-Disk Structure) is to perform backup on the old version of Firebird and restore on the new one. This is a rather lengthy process, especially on large databases.
However, in the case of updating a minor version of ODS (the number after the dot) backup/restore is redundant (it is only necessary to add the missing system tables and fields, as well as some packages). An example of such an update is updating ODS 13.0 (Firebird 4.0) to ODS 13.1 (Firebird 5.0), since the major version of ODS 13 remained the same.
Starting from Firebird 5.0, it became possible to update the minor version of ODS without the lengthy backup and restore operations. For this, the gfix
utility is used with the -upgrade
switch.
Key points:
-
The update must be performed manually using the command
gfix -upgrade
-
Exclusive access to the database is required, otherwise an error is issued.
-
The system privilege
USE_GFIX_UTILITY
is required. -
The update is transactional, all changes are rolled back in case of an error.
-
After the update, Firebird 4.0 can no longer open the database.
Usage:
gfix -upgrade -user -pass
Note
|
|
2. Improved data compression algorithm
As you know, in Firebird, table records are stored on data pages (DP) in compressed form. This is done so that as many records as possible can fit on one page, which in turn saves disk input-output. Until Firebird 5.0, the classic Run Length Encoding (RLE) algorithm was used to compress records.
The classic RLE algorithm works as follows. A sequence of repeated characters is reduced to a control byte, which determines the number of repetitions, followed by the actual repeated byte. If the data cannot be compressed, the control byte indicates that "the next n bytes should be output unchanged".
The control byte is used as follows:
-
n > 0
[1 .. 127] - next n байт will be stored as is; -
n < 0
[-3 .. -128] - next byte will be repeated n times, but stored only once; -
n = 0
- end of data.
Mainly, RLE is effective for compressing trailing zeros in fields of type VARCHAR(N)
, which are not fully filled or are equal to NULL. It is fast enough and does not load the processor much unlike dictionary-based algorithms, such as LHZ, ZIP, GZ.
But the classic RLE algorithm has drawbacks:
-
the maximum compression ratio is 64 times: the control byte can encode 128 repeating bytes turning them into 2 bytes. Thus, 32000 identical bytes will take up 500 bytes. This problem has worsened lately with the advent of the UTF8 encoding, where 4 bytes are allocated for each character.
-
in some cases, the compressed byte sequence may become longer than the uncompressed one, if the data is not compressible.
-
frequent alternation of short compressible and non-compressible sequences additionally loads the processor, thus offsetting the benefit of saving disk input-output.
Therefore, in Firebird 5.0, an improved RLE compression algorithm (with a variable-length counter) was developed. This algorithm is available only in databases with ODS 13.1 and higher.
Note
|
Updating ODS using |
The improved RLE algorithm works as follows. Two previously unused lengths -1 and -2 are used as special markers for longer compressible sequences:
-
{-1, two-byte counter, byte value}
- repeating sequences of length from 128 bytes to 64 KB; -
{-2, four-byte counter, byte value}
- repeating sequences of length more than 64 KB.
Compressible sequences of length 3 bytes make no sense if they are located between two non-compressible runs. Compressible sequences of length from 4 to 8 bytes are a borderline case, as they are not very compressed, but increase the total number of runs, which negatively affects the unpacking speed. Starting from Firebird 5.0 fragments shorter than 8 bytes are not compressed.
In addition, in Firebird 5.0 (ODS 13.1) there is another improvement: if as a result of applying the RLE compression algorithm to the record, the byte sequence turned out to be longer (non-compressible data), then the record will be written to the page as is and marked with a special flag as uncompressed.
Now I will show by examples how the new RLE algorithm increases the performance of queries.
First of all, let’s note that compressing records is not a free operation in terms of resources (CPU and memory). This can be easily verified by executing two queries:
SELECT COUNT(*) FROM BIG_TABLE;SELECT COUNT(SOME_FIELD) FROM BIG_TABLE;
The first query does not use record unpacking, because we are not interested in their content (it is enough to just count the number). The second query has to unpack each record to make sure that the field SOME_FIELD
is not NULL. First, let’s see how this is done in Firebird 4.0.
SELECT COUNT(*)FROM WORD_DICTIONARY;
COUNT===================== 4079052Current memory = 2610594912Delta memory = 0Max memory = 2610680272Elapsed time = 0.966 secBuffers = 153600Reads = 0Writes = 0Fetches = 4318077
SELECT COUNT(CODE_DICTIONARY)FROM WORD_DICTIONARY;
COUNT===================== 4079052Current memory = 2610596096Delta memory = 1184Max memory = 2610685616Elapsed time = 1.770 secBuffers = 153600Reads = 0Writes = 0Fetches = 4318083
1.770 - 0.966 = 0.804
- 1.770 - 0.966 = 0.804
- most of this time is just the cost of unpacking records.
Now let’s look at the same thing on Firebird 5.0.
SELECT COUNT(*)FROM WORD_DICTIONARY;
COUNT===================== 4079052Current memory = 2577478608Delta memory = 176Max memory = 2577562528Elapsed time = 0.877 secBuffers = 153600Reads = 0Writes = 0Fetches = 4342385
SELECT COUNT(CODE_DICTIONARY)FROM WORD_DICTIONARY;
COUNT===================== 4079052Current memory = 2577491280Delta memory = 12672Max memory = 2577577520Elapsed time = 1.267 secBuffers = 153600Reads = 0Writes = 0Fetches = 4342393
1.267 - 0.877 = 0.390
- This is twice less than in Firebird 4.0. Let’s take a look at the statistics of this table in Firebird 4.0 and Firebird 5.0.
WORD_DICTIONARY (265) Primary pointer page: 855, Index root page: 856 Total formats: 1, used formats: 1 Average record length: 191.83, total records: 4079052 Average version length: 0.00, total versions: 0, max versions: 0 Average fragment length: 0.00, total fragments: 0, max fragments: 0 Average unpacked length: 670.00, compression ratio: 3.49 Pointer pages: 19, data page slots: 59752 Data pages: 59752, average fill: 87% Primary pages: 59752, secondary pages: 0, swept pages: 0 Empty pages: 1, full pages: 59750 Fill distribution: 0 - 19% = 1 20 - 39% = 0 40 - 59% = 0 60 - 79% = 1 80 - 99% = 59750
WORD_DICTIONARY (265) Primary pointer page: 849, Index root page: 850 Total formats: 1, used formats: 1 Average record length: 215.83, total records: 4079052 Average version length: 0.00, total versions: 0, max versions: 0 Average fragment length: 0.00, total fragments: 0, max fragments: 0 Average unpacked length: 670.00, compression ratio: 3.10 Pointer pages: 21, data page slots: 65832 Data pages: 65832, average fill: 88% Primary pages: 65832, secondary pages: 0, swept pages: 0 Empty pages: 4, full pages: 65824 Fill distribution: 0 - 19% = 5 20 - 39% = 2 40 - 59% = 0 60 - 79% = 1 80 - 99% = 65824
From the statistics, it can be seen that the compression ratio is even lower than in Firebird 4.0. So what accounts for such a colossal gain in performance? To understand this, we need to look at the structure of this table:
CREATE TABLE WORD_DICTIONARY ( CODE_DICTIONARY BIGINT NOT NULL, CODE_PART_OF_SPEECH INTEGER NOT NULL, CODE_WORD_GENDER INTEGER NOT NULL, CODE_WORD_AFFIXE INTEGER NOT NULL, CODE_WORD_TENSE INTEGER DEFAULT -1 NOT NULL, NAME VARCHAR(50) NOT NULL COLLATE UNICODE_CI, PARAMS VARCHAR(80), ANIMATE D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, PLURAL D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, INVARIABLE D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, TRANSITIVE D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, IMPERATIVE D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, PERFECT D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, CONJUGATION SMALLINT, REFLEXIVE D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */, PROHIBITION D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */);
In this table, only the fields NAME
and PARAMS
can be well compressed. Since the fields of type INTEGER have the NOT NULL modifier, and the field takes up 4 bytes, such fields are not compressed in Firebird 5.0. Fields with the D_BOOL domain in UTF8 encoding can be compressed for the value 'Yes' (12 - 4 = 8 bytes) and will not be for the value 'No' (12 - 6 = 6 bytes).
Since the table has many short sequences that could be compressed in Firebird 4.0 and are not compressed in Firebird 5.0, the number of runs processed for unpacking in Firebird 5.0 is less, which gives us a performance gain.
Now I will show an example where the new RLE algorithm greatly wins in compression. For this, we will execute the following script:
CREATE TABLE GOOD_ZIP( ID BIGINT NOT NULL, NAME VARCHAR(100), DESCRIPTION VARCHAR(1000), CONSTRAINT PK_GOOD_ZIP PRIMARY KEY(ID));SET TERM ^;EXECUTE BLOCKASDECLARE I BIGINT = 0;BEGIN WHILE (I < 100000) DO BEGIN I = I + 1; INSERT INTO GOOD_ZIP ( ID, NAME, DESCRIPTION ) VALUES ( :I, 'OBJECT_' || :I, 'OBJECT_' || :I ); ENDEND^SET TERM ;^COMMIT;
And now let’s look at the statistics of the table GOOD_ZIP in Firebird 4.0 and Firebird 5.0.
GOOD_ZIP (128) Primary pointer page: 222, Index root page: 223 Total formats: 1, used formats: 1 Average record length: 111.09, total records: 100000 Average version length: 0.00, total versions: 0, max versions: 0 Average fragment length: 0.00, total fragments: 0, max fragments: 0 Average unpacked length: 4420.00, compression ratio: 39.79 Pointer pages: 2, data page slots: 1936 Data pages: 1936, average fill: 81% Primary pages: 1936, secondary pages: 0, swept pages: 0 Empty pages: 0, full pages: 1935 Fill distribution: 0 - 19% = 0 20 - 39% = 0 40 - 59% = 1 60 - 79% = 5 80 - 99% = 1930
GOOD_ZIP (128) Primary pointer page: 225, Index root page: 226 Total formats: 1, used formats: 1 Average record length: 53.76, total records: 100000 Average version length: 0.00, total versions: 0, max versions: 0 Average fragment length: 0.00, total fragments: 0, max fragments: 0 Average unpacked length: 4420.00, compression ratio: 82.22 Pointer pages: 1, data page slots: 1232 Data pages: 1232, average fill: 70% Primary pages: 1232, secondary pages: 0, swept pages: 0 Empty pages: 2, full pages: 1229 Fill distribution: 0 - 19% = 3 20 - 39% = 0 40 - 59% = 0 60 - 79% = 1229 80 - 99% = 0
As you can see, in this case the compression ratio in Firebird 5.0 is twice as high!
And finally, let’s look at an example with non-compressible data. For this, we will execute the script:
CREATE TABLE NON_ZIP( UID BINARY(16) NOT NULL, REF_UID_1 BINARY(16) NOT NULL, REF_UID_2 BINARY(16) NOT NULL);SET TERM ^;EXECUTE BLOCKASDECLARE I BIGINT = 0;BEGIN WHILE (I < 100000) DO BEGIN I = I + 1; INSERT INTO NON_ZIP ( UID, REF_UID_1, REF_UID_2 ) VALUES ( GEN_UUID(), GEN_UUID(), GEN_UUID() ); ENDEND^SET TERM ;^COMMIT;
Let’s look at the statistics of the table NON_ZIP in v4 and v5:
NON_ZIP (129) Primary pointer page: 2231, Index root page: 2312 Total formats: 1, used formats: 1 Average record length: 53.00, total records: 100000 Average version length: 0.00, total versions: 0, max versions: 0 Average fragment length: 0.00, total fragments: 0, max fragments: 0 Average unpacked length: 52.00, compression ratio: 0.98 Pointer pages: 1, data page slots: 1240 Data pages: 1240, average fill: 69% Primary pages: 1240, secondary pages: 0, swept pages: 0 Empty pages: 5, full pages: 1234 Fill distribution: 0 - 19% = 5 20 - 39% = 1 40 - 59% = 0 60 - 79% = 1234 80 - 99% = 0
NON_ZIP (129) Primary pointer page: 1587, Index root page: 1588 Total formats: 1, used formats: 1 Average record length: 52.00, total records: 100000 Average version length: 0.00, total versions: 0, max versions: 0 Average fragment length: 0.00, total fragments: 0, max fragments: 0 Average unpacked length: 52.00, compression ratio: 1.00 Pointer pages: 1, data page slots: 1240 Data pages: 1240, average fill: 68% Primary pages: 1240, secondary pages: 0, swept pages: 0 Empty pages: 5, full pages: 1234 Fill distribution: 0 - 19% = 5 20 - 39% = 1 40 - 59% = 0 60 - 79% = 1234 80 - 99% = 0
In Firebird 4.0, as a result of compression, the record length increased, Firebird 5.0 saw that as a result of compression, the records become longer and saved the record as it is.
3. Cache of prepared (compiled) statements
The prepared queries cache is one of the most impressive new features of version 5.0. Simply enabling the parameter in the configuration can speed up the application in some scenarios (with frequent queries) by several times.
3.1. A little theory
Any SQL query goes through two mandatory stages: preparation (compilation) and execution.
During the preparation of the query, its syntactic analysis, allocation of buffers for input and output messages, construction of the query plan and its execution tree are performed.
If the application requires multiple execution of the same query with a different set of input parameters, then prepare is usually called separately, the handle of the prepared query is saved in the application, and then execute is called for this handle. This allows to reduce the costs of re-preparing the same query or each execution.
Starting from Firebird 5.0, a cache of compiled (prepared) queries is supported for each connection. This allows to reduce the costs for re-preparing the same queries, if your application does not use explicit caching of handles of prepared queries (at the global level this is not always easy).
By default, caching is enabled, the caching threshold is determined by the parameter MaxStatementCacheSize
in firebird.conf
. It can be disabled by setting MaxStatementCacheSize
to zero. The cache is maintained automatically: cached statements become invalid when necessary (usually when executing any DDL statement).
Note
|
A query is considered the same if it matches exactly by character, that is, if you have semantically identical queries, but they differ by a comment, then for the cache of prepared queries these are different queries. |
In addition to top-level queries, stored procedures, functions and triggers also fall into the cache of prepared queries. The contents of the compiled queries cache can be viewed using the new monitoring table MON$COMPILED_STATEMENTS
.
Column name | Datatype | Description |
---|---|---|
|
|
Identified of compiled query |
|
|
The text of the statement in SQL language. Inside PSQL objects, the text of SQL statements is not displayed. |
|
|
Operator’s plan in 'explain' format. |
|
|
Name of PSQL object (trigger, stored function or stored procedure), where this SQL operator was compiled. |
|
|
Тип объекта. |
|
|
Name of PSQL package |
|
|
Identifier of statistics |
A new column MON$COMPILED_STATEMENT_ID
has appeared in the tables MON$STATEMENTS
and MON$CALL_STACK
, which refers to the corresponding prepared statement in MON$COMPILED_STATEMENTS
.
The monitoring table MON$COMPILED_STATEMENTS
allows you to easily get the plans of internal queries in a stored procedure, for example like this:
SELECT CS.MON$EXPLAINED_PLANFROM MON$COMPILED_STATEMENTS CSWHERE CS.MON$OBJECT_NAME = 'SP_PEDIGREE' AND CS.MON$OBJECT_TYPE = 5ORDER BY CS.MON$COMPILED_STATEMENT_ID DESCFETCH FIRST ROW ONLY
Note
|
Note that the same stored procedure can appear in |
4. Support for bidirectional cursors in the network protocol
A cursor in SQL is an object that allows you to move through the records of any result set. It can be used to process a single database record returned by a query. There are unidirectional and bidirectional (scrollable) cursors.
A unidirectional cursor does not support scrolling, that is, retrieving records from such a cursor is possible only sequentially, from the beginning to the end of the cursor. This type of cursor is available in Firebird from the earliest versions, both in PSQL (explicitly declared and implicit cursors) and through the API.
A scrollable or bidirectional cursor allows you to move through the cursor in any direction, jump around and even move to a given position. Support for bidirectional (scrollable) cursors first appeared in Firebird 3.0. They are also available in PSQL and through the API interface.
However, until Firebird 5.0, scrollable cursors were not supported at the network protocol level. This means that you could use the API of bidirectional cursors in your application, only if your connection occurs in embedded mode. Starting from Firebird 5.0 you can use the API of scrollable cursors even if you connect to the database over the network protocol, while the client library fbclient must be no lower than version 5.0.
If your application does not use fbclient, for example written in Java or .NET, then the corresponding driver must support the network protocol Firebird 5.0. For example, Jaybird 5 supports bidirectional cursors in the network protocol.
5. Tracing the COMPILE event
In Firebird 5.0, it became possible to track a new tracing event: parsing stored modules. It allows you to track the moments of parsing stored modules, the corresponding time spent and most importantly - the plans of queries inside these PSQL modules. Tracking the plan is also possible if the PSQL module was already loaded before the start of the tracing session; in this case, the plan will be reported during the first execution noticed by the tracing session.
The following parameters have appeared in the tracing configuration to track the module parsing event:
-
log_procedure_compile
- enables tracing of procedure parsing events; -
log_function_compile
- enables tracing of function parsing events; -
log_trigger_compile
- enables tracing of trigger parsing events.
Suppose we have the following query:
SELECT * FROM SP_PEDIGREE(7435, 8, 1);
To track the plan of a stored procedure in a tracing session, you need to set the parameter log_procedure_compile = true
. In this case, when preparing this query or executing it, a procedure parsing event will appear in the tracing log, which looks like this:
2023-10-18T20:40:51.7620 (3920:00000000073A17C0) COMPILE_PROCEDURE horses (ATT_30, SYSDBA:NONE, UTF8, TCPv6:::1/54464) C:\Firebird\5.0\isql.exe:10960Procedure SP_PEDIGREE:^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^Cursor "V" (scrollable) (line 19, column 3) -> Record Buffer (record length: 132) -> Nested Loop Join (inner) -> Window -> Window Partition -> Record Buffer (record length: 82) -> Sort (record length: 84, key length: 12) -> Window Partition -> Window Buffer -> Record Buffer (record length: 41) -> Procedure "SP_HORSE_INBRIDS" as "V H_INB SP_HORSE_INBRIDS" Scan -> Filter -> Table "HUE" as "V HUE" Access By ID -> Bitmap -> Index "HUE_IDX_ORDER" Range Scan (full match)Select Expression (line 44, column 3) -> Recursion -> Filter -> Table "HORSE" as "PEDIGREE HORSE" Access By ID -> Bitmap -> Index "PK_HORSE" Unique Scan -> Union -> Filter (preliminary) -> Filter -> Table "HORSE" as "PEDIGREE HORSE" Access By ID -> Bitmap -> Index "PK_HORSE" Unique Scan -> Filter (preliminary) -> Filter -> Table "HORSE" as "PEDIGREE HORSE" Access By ID -> Bitmap -> Index "PK_HORSE" Unique Scan 28 ms
6. Per-table statistics in isql
Per-table statistics show how many records for each table were read by a full scan, how many using an index, how many inserted, updated or deleted and other counters. The values of these counters have been available for a long time through the API function isc_database_info, which was used by many graphical tools, but not by the console tool isql. The values of these same counters can be obtained by using the monitoring tables MON$RECORD_STATS and MON$TABLE_STATS, or in tracing. Starting from Firebird 5.0, this useful feature appeared in isql.
By default, per-table statistics output is disabled.
To enable it, you need to type the command:
SET PER_TAB ON;
To disable:
SET PER_TAB OFF;
The command SET PER_TAB without the words ON or OFF toggles the state of statistics output.
The full syntax of this command can be obtained using the command HELP SET.
Example of per-table statistics output:
SQL> SET PER_TAB ON;SQL> SELECT COUNT(*)CON> FROM HORSECON> JOIN COLOR ON COLOR.CODE_COLOR = HORSE.CODE_COLORCON> JOIN BREED ON BREED.CODE_BREED = HORSE.CODE_BREED; COUNT===================== 519623Per table statistics:--------------+---------+---------+---------+---------+---------+---------+---------+---------+ Table name | Natural | Index | Insert | Update | Delete | Backout | Purge | Expunge |--------------+---------+---------+---------+---------+---------+---------+---------+---------+BREED | 282| | | | | | | |COLOR | 239| | | | | | | |HORSE | | 519623| | | | | | |--------------+---------+---------+---------+---------+---------+---------+---------+---------+
7. Summary
We’ve just explored some of the cool new things that Firebird 5 can do for you, like making your code faster, easier, and more secure. But wait, there’s more! We still have one more exciting topic to cover: the SQL profiler in 5.0. Don’t go anywhere!