Example 3. Using partial indices with non-selective predicates
Suppose we need to get all dead horses for which the date of death is known. A horse is definitely dead if it has a date of death, but it often happens that it is not listed or is simply unknown. Moreover, the number of unknown dates of death is much greater than the known ones. To do this, we will write the following query:
SELECT COUNT(*)
FROM HORSE
WHERE DEATHDATE IS NOT NULL;
We want to get this list as quickly as possible, so we’ll try to create an index on the DEATHDATE
field.
CREATE INDEX IDX_HORSE_DEATHDATE
ON HORSE(DEATHDATE);
Now let’s try to run the query above and look at its plan and statistics:
Select Expression
-> Aggregate
-> Filter
-> Table "HORSE" Full Scan
COUNT
=====================
16234
Current memory = 2579550800
Delta memory = 176
Max memory = 2596993840
Elapsed time = 0.196 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 555810
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
Table name | Natural | Index | Insert | Update | Delete | Backout | Purge | Expunge |
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
HORSE | 519623| | | | | | | |
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
As you can see, it was not possible to use the index.
The reason is that the predicates IS NOT NULL
, <>
, IS DISTINCT FROM
are low-selective.
Currently, Firebird does not have histograms with the distribution of index key values, and therefore the distribution is assumed to be uniform. With a uniform distribution, there is no point in using an index for such predicates, which is what is done.
Now let’s try to delete the previously created index and create a partial index instead:
DROP INDEX IDX_HORSE_DEATHDATE;
CREATE INDEX IDX_HORSE_DEATHDATE
ON HORSE(DEATHDATE) WHERE DEATHDATE IS NOT NULL;
And let’s try to repeat the request above:
Select Expression
-> Aggregate
-> Filter
-> Table "HORSE" Access By ID
-> Bitmap
-> Index "IDX_HORSE_DEATHDATE" Full Scan
COUNT
=====================
16234
Current memory = 2579766848
Delta memory = 176
Max memory = 2596993840
Elapsed time = 0.017 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 21525
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
Table name | Natural | Index | Insert | Update | Delete | Backout | Purge | Expunge |
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
HORSE | | 16234| | | | | | |
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
As you can see, the optimizer managed to use our index. But the most interesting thing is that our index will continue to work with other date comparison predicates (but it will not work for IS NULL
).
SELECT COUNT(*)
FROM HORSE
WHERE DEATHDATE = DATE'01.01.2005';
Select Expression
-> Aggregate
-> Filter
-> Table "HORSE" Access By ID
-> Bitmap
-> Index "IDX_HORSE_DEATHDATE" Range Scan (full match)
COUNT
=====================
190
Current memory = 2579872992
Delta memory = 192
Max memory = 2596993840
Elapsed time = 0.004 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 376
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
Table name | Natural | Index | Insert | Update | Delete | Backout | Purge | Expunge |
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
HORSE | | 190| | | | | | |
--------------------------------+---------+---------+---------+---------+---------+---------+---------+---------+
The optimizer in this case realized that the IS NOT NULL filter condition in the partial index covers any other predicates that do not compare to NULL.