Articles

The differences between VARCHAR and INTEGER fields in keys

I’m using integer IDs for primary keys in my tables. What if I would use varchar fields instead? Will I lose performance in that case, especially for big tables? Will joins still work as fast as they do for the integer column.
 
Your performance should be about the same with varchars or integers.  Firebird always compares index keys bytewise and only the significant part of the value is stored.   
 
A single field key is first converted to one of the three canonical types:  string with collation, double precision, and (sadly) 64 bit integer.  Dates become double precision floating point numbers.   
 
Strings that have a collation other than their byte value are converted to their collation format.  That's something of a black art and expands the size of the string, but the result is that the string finds its correct place when sorted with other strings of the same collation.  'A', 'a', 'â', 'á', 'Ă', 'ã', 'ä', 'å', 'ă', 'ą', 'Ā' all appear in their appointed places.  (Sorry for what that did to your email client ....  in mine, that's eleven variants on 'A'.)  Trailing blanks are not included in the key.
 
The double precision number is mangled so it too sorts bytewise - roughly invert the sign, then exponent, then mantissa, truncating trailing zeros.   
 
Depending on the endianness of 64bit integers on the computer, they too are mangled so they compare bytewise.  That may seem like a disoptimization, but index keys are not stored on natural boundaries and they undergo prefix compression so there's no way to use a larger comparison than byte by byte.   
 
Compound keys are much the same.  Each part is converted to its index key type and padded to a multiple of 4 bytes.  After every four bytes, Firebird adds a byte with the position of the current field of the key.  Thus an index on LastName, FirstName, ZodiacSign would come out as 1Harr1ison2Ann 3Gemi3ni.  This avoids the embarassment of confusing Damnation with Dam nation.  
 
Why did I say "(sadly)" above?  Because having a single format for numbers allows Firebird to change the size of numbers without recreating indexes on them.  But when Borland added 64 bit integers back - InterBase had 64 bit integers from the beginning on Vaxes - some bright spark realized that double precision has 56 bits of precision and 64 bit integers have 64 bits.  On the other hand, Firebird indexes are designed to handle some imprecision ... or the remaining 8 bits could be tacked on the end... whatever.  So you have to rebuild indexes when going from Numeric/Decimal 9 to Numeric/Decimal 12.  Sad.

"Prefix compression?"  When storing a key other than the first on a page or the first after a jump on page Firebird looks at the preceding key and truncates that part of the beginning of the next key that duplicates is predecessor and tacks the length of the truncated part at the beginning.  Thus the strings "AAAA", "AAAB", "AAAC", "AABC" become 
"AAAA", "3B", "3C", and "2BC".    There is a problem with some formats of GUID which put the volatile part of the number first, followed by the fixed part.  That defeats prefix compression and inflates the size of indexes.
 
"Jump?" - Prefix compression reduces the size of indexes by a lot, reducing I/O, but requires reading across the whole page to decipher the key.  Fine with 1K pages, but with larger page sizes the computation was unacceptable.  So each index page now has an index of its own pointing to the offsets of uncompressed entries. That index is called a jump vector.
 
Ann Harrision