Library

Data types

NOTICE: This document is the chapter from the book "The InterBase World" which was written by Alexey Kovyazin and Serg Vostrikov.

Despite the fact that data types are described in detail in documentation (see [1, chapter 4]), it’s necessary to consider series of notions that will be frequently used in the following chapters of this book. Apart from general information several examples of using data types in InterBase databases and recommendations about their using and conversion will also be reviewed. We will consider in detail data types distinctions existing in 1st and 3rd InterBase database dialect.

About data

Data types are basic elements of any programming language or any DBMS server. InterBase is not the exception. When we say that database stores some information, we must realize that this information cannot be stored in one place. It’s vice versa; data must be sorted out and put to its “shelf”. Data types define what can be put to a certain “shelf” and what not. When we are speaking about the “shelf”, first of all we mean fields of database tables (see chapter "Tables. Primary keys and generators" (part 1)) , as well as variables within the triggers and stored procedures etc. Every data type has a set of operations that can be executed over the values of this type. Therefore it’s important to choose the right data type when designing a database. It will help to avoid many problems when developing clients’ programs. In InterBase there are 12 data types that can satisfy the developer’s requirements in storing the data. These types are conditionally subdivided into 6 following groups:

for storing integers – INTEGER and SMALLINT;
for storing real numbers – FLOAT and DOUBLE PRECISION;
for storing numbers with fixed precision – NUMERIC and DECIMAL;
for storing date, time and timestamp – DATE, TIME and TIMESTAMP;
for storing symbols – CHARACTER (in abbreviated form – CHAR) and VARYING CHARACTER (VARCHAR);
for storing dynamic extending data – BLOB

In addition, it is possible to define the arrays of values of primitive types, i.e. all enumerated types except BLOB.
Most of InterBase data types correspond to the types defined in SQL92 standard, however, apart from this there are their own peculiarities – arrays of primitive data types and BLOB. Arrays in InterBase may contain a set of data of the same type in one field. For example, we can define the array of values of INTEGER type. Arrays may have several dimensions. BLOB data type is a dynamically extending data type, which name is often deciphered as Binary Large OBject. It should be said that BLOB is the invention of InterBase developers that later spread and settled down in all contemporary SQL-servers.

Syntax of defining data types

Data types are used for describing fields in tables, variables in triggers and stored procedures. Common syntax of defining all possible data types in InterBase is given below.

< datatype> =
{SMALLINT | INTEGER | FLOAT | DOUBLE PRECISION}[ ]
| {DATE | TIME | TIMESTAMP} [ ]
| {DECIMAL | NUMERIC} [( precision [, scale])] [ ]
| {CHAR | CHARACTER | CHARACTER VARYING | VARCHAR} [( int)]
[ ] [CHARACTER SET charname]
| {NCHAR | NATIONAL CHARACTER | NATIONAL CHAR}
[VARYING] [( int)] [ ]
| BLOB [SUB_TYPE { int | subtype_name}] [SEGMENT SIZE int]
[CHARACTER SET charname]
| BLOB [(seglen [, subtype])]

Data types’ characteristics such as size, precision and a range of possible values are described in detail in table 4.1 в [1], therefore we will not repeat ourselves here. Now let’s review briefly the main peculiarities of data types and focus on their possible application.

Integer types

SMALLINT and INTEGER refer to integer types. I should say that SMALLINT represents a crippled version of INTEGER. Its length is 2 bytes unlike 4 meant for storing INTEGER. Usually you should not save on disk space. On this ground, it is better to use INTEGER type for storing integer values.
The area of applying integer types is obvious: they are necessary for fields containing only integers – for storing counters, number etc. Usually INTEGER type has also the fields containing primary keys.

Real data types

FLOAT and DOUBLE PRECISION types relate to real ones (they are also called number types with a floating point). At first, I want to warn the reader against using FLOAT type – its precision is not enough for storing most of fractional values. Especially it is not recommended to store money values in it – in FLOAT type variables rounding errors appear very quickly and it may surprise an accountant when making calculations. The best way to store numbers with a floating point (for example, in accounting systems and systems for scientific calculations) in a database is to store them in DOUBLE PRECISION type.

You should take into account that in InterBase 3rd dialect there is a mechanism of storing types with a fixed-length point of 64 bytes. This mechanism is used for storing money values. Usage of these types provides the best precision.

Data types with a fixed point

NUMERIC and DECIMAL refer to these data types. The question about the difference between NUMERIC and DECIMAL sounds very often. These both types have an identical digit capacity - from 1 up to 18 signs, identical precision - from zero up to a digit capacity.

Let's remind, that: the digit capacity is a total number of digits among, and precision - a number of signs after a comma.

The funniest thing is that these types differ in the maximal digit capacity according to documentation, but actually, they are realized practically equally and there is no difference between them.

You can easily check it up, having started the utility isql and having made below-mentioned sequence of actions. We create the table of the following view:

SQL> CREATE TABLE test (
CON> Num_field NUMERIC(15,2),
CON> Dec_field DECIMAL(15,2));

Then we give a command to show the structure of the table:

SQL> show tables test;

And we see the following

NUM_FIELD NUMERIC(15, 2) Nullable
DEC_FIELD NUMERIC(15, 2) Nullable

As you can see, InterBase informs that both given columns have NUMERIC type. The reasons of such behavior lie in realization of data types with the fixed point. The matter is that InterBase has only three mechanisms of storing any integer expression, and all types, whatever they are named, are realized as to these variants.

Here is the table from [1], which illustrates how various integer types (table 1.1) are stored. As you can see, storing of the data in 3-rd dialect is different for numbers with the big digit capacity:

Table 1.1. Storing of numbers with the fixed point

Digit capacity	Dialect 1	Dialect 3
from 1 to 4	SMALLINT for NUMERIC INTEGER for DECIMAL	SMALLINT
from 5 to 9	INTEGER	INTEGER
from 10 to 18	DOUBLE PRECISION	INT64

Now we can tell for sure what the main differences between NUMERIC and DECIMAL types are: in case of defining a field (of variable) with a small digit capacity (up to four) the first is stored as 2 byte integer SMALLINT, and the second - as 4 byte INTEGER. Thus, if a digit capacity is more than four, DECIMAL and NUMERIC types will be equivalent.
Pay attention to the difference of realization of types with the big digit capacity in first and third dialects. In first dialect, the number with the fixed point turned from the integer to real, where the mechanisms of a rounding off were applied. In third dialect this peculiarity has been liquidated - the big integers are stored really as the integers – using INT64 mechanism which can store 64-bit numbers in a range +/-2^32. Therefore, it would be better to store the data on monetary funds in the databases created with the use of 3-rd dialect. Only using INT64 mechanism will guarantee safety of the small monetary rests.

Types for storing data and time

Types for storing date and time have changed in version InterBase 6.x and his clones in comparison with 4.х and 5.х. In order not be confused in historical cobwebs with these types, we shall consider a situation in InterBase 6th version. Then based on it we shall briefly mention what was earlier. This is done for those users who still work using early versions of InterBase. So, there are 3 types in InterBase 6.x for storing date and time - DATE, TIME and TIMESTAMP.

Type DATE stores dates to within a day. A range of possible values - from January, 1 100 A.D. till February, 29 32768.
Type TIME stores the data on time to within a ten-thousand share of second. A range of possible values - from 00:00 AM up to 23:59.9999 PM.
Type TIMESTAMP represents a combination of DATE and TIME types.

How to work with dates? If the question is about the work at the level of server in stored procedures or triggers, everything is quite simple - we can always declare a variable of the required type and set it from the tables and vice versa. However, it is necessary to transfer the data from a database to the appendix and back. In this case there are two approaches: to use libraries which apply an original format of InterBase dates for the access to objects of these types and convert this format to ordinary intralanguage date / time types (FIBPlus is the example of such library), or to use the mechanism of transforming dates to lines, embedded in InterBase.
What will you do if you need to cut out only year or month from the full date? You will have to use a group of functions EXTRACT for this purpose (available in all clones of InterBase 6.х) allowing you to extract only the required part from the date. These functions are used in the following way:
EXTRACT (MONTH FROM DATE_FIELD)
EXTRACT (YEAR FROM DATE_FIELD)
The full list of parameters in function EXTRACT is the following: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, WEEKDAY, YEARDAY. Their function follows from their name, therefore we will not interpret them here.

Data types for storing text

There are two types in InterBase meant for storing text information – CHAR и VARCHAR. Their full names – CHARACTER and CHARACTER VARYING, however there is no reason to use long names – even command Show tables in utility isql gives brief names of types.
In order to define a field or a variable of symbolical type, it is necessary to specify in brackets after a name of type a number of symbols which will be used in a defined object, or miss a number of symbols - thus the field with a length of 1 symbol will be created.

CREATE TABLE testCHARLen(
Field1 CHAR(255),
Field2 CHAR);

When this table is created, Field1 will have length of 255 symbols, and Field2 - 1 symbol.
Types CHAR and VARCHAR in many respects are similar - both can contain up to 32768 symbols, however there are some differences. Although these both types are stored in a database in the same ways, but InterBase works with them in different ways. The following example demonstrates this:

SQL> create table testCHAR ( c1 char(10), c2 varchar(10));
SQL> insert into testCHAR(c1,c2) values('Test','Test');
SQL> SELECT '('||c1||')', '('||c2||')' from testCHAR;

In the result, we will get the following:

(Test ) (Test)

As you can see, after value 'Test', chosen from a field c1, there were blanks. It means that when selecting the data from a field of CHAR type the returned value is supplemented with blanks up to the full length of a field. It is difficult to assume why such behavior that leads to substantial growth of the network traffic (loading of a network) is necessary. In any case, VARCHAR is a symbolical type recommended to use.
One of the major characteristics of symbolical type is its character set - CHARACTER SET. The character set is defined for the whole database and used by default for all symbolical fields if it is not redefined explicitly when creating a field. In order to create a symbolical field with the explicit indication of a character set, it is necessary to add the description of a character set in the description of a column (in sentences CREATE TABLE or ALTER TABLE). Character set WIN1251 is usually used for the support of Russian (if you want to know in detail about the usage of Russian in InterBase see the chapter "InterBase Russification"(part1)). Here is an example of the table containing a symbolical field with the explicitly described character set WIN1251:

CREATE TABLE TestCHARSET(
Field1 VARCHAR(255),
Field2 VARCHAR(255) CHARACTER SET win1251);

Here Field1 is the field without the explicit indication of a character set, therefore the character set specified when creating a database will be used for it. It is obviously determined for field Field2 that it will store symbols in WIN1251 coding.
Besides indicating a character set for symbolical fields, you can also specify a collation order that defines how the symbols of this data set will be sorted. For Russian there are two variants of collation - WIN1251 and PXW_CYRL. For more details about the use of COLLATION ORDER, see the chapter "InterBase Russification ".
You can find a full list of character sets and COLLATION ORDER applied for them in the documentation [1, chapter 13].
Attention! According to the documentation on InterBase 6 there are 4 symbolical types: apart from the data types mentioned above 2 more exist - NCHAR and NCHAR VARYING, however the same documentation below explains that the last two types are the same types as CHAR and VARCHAR, only where the character set ISO8859_1 is used by default. It means that actually using pseudo-type NCHAR is equivalent to applying CHAR DEFAULT CHARACTER SET ISO8859_1. Similarly for NCHAR VARYING, only there instead of CHAR VARCHAR is used. It is obvious that the application of these pseudo-types is meant for users in the Western Europe and the USA, where a character set ISO8859_1 is created for the support of languages.

Blob data type

BLOB data type is intended for storing a great number of data of a variable size. BLOB type allows to store data that cannot be placed in fields of other types, - for example, a picture, musical files, video fragments, etc. The requirements for defining the simplest field of BLOB type in the table are the same as those for defining a field of any elementary type:

CREATE TABLE testBLOB(
myBlobField BLOB);

As a result, the field myBlobField in which it is possible to store the sizable data will be created. Despite the fact that BLOB fields do not differ from others in the way of definition, their realization within a database differs very much. Не-BLOB-fields are located on data page (see the chapter " InterBase database structure " (part 4)) close to each other, and in case only BLOB identifier is stored on data page, BLOB itself is allocated on special page. Such data structure allows to store the data of a unfixed size.
BLOB type has a possibility to define a set of several subtypes and special procedures called filters (BLOB filters) for work with these subtypes. There are some predetermined BLOB subtypes built in in InterBase. All these subtypes have non-negative numbers, for example subtype 0 – data of undefined type, subtype 1 - the text, subtype 2 - BLR (Binary Language Representation, see the glossary and the chapter "InterBase database structure "), etc. A user can also define BLOB subtypes that can have negative values. The filter can be applied to every type. It will convert a field of one subtype into another.
It should be noted that using of BLOB-fields is usually an alternative to storing of external in relation to a database files. As to BLOB filters, they are used quite rarely owing to the orientation to a narrow category of tasks.

Arrays

DBMS InterBase was one of the first in which arrays appeared. Support of arrays in a database is the extension of traditional relational model. The presence of arrays allows to simplify work with sets of one data type.
The array is a collection of values of one type, having a common name and permitting to address to any array element according to its number. Arrays in InterBase can be one-dimensional and multi-dimensional.
In order to create a field of array of numbers INTEGER in the table, it should be written something like the following

CREATE TABLE test(
myOneDimArray INTEGER[12],
myTwoDimArray INTEGER[5,4],
myThreeDimArray INTEGER[2,10,8]);

Thus, three fields of array type will be created: myOneDimArray - a field containing one-dimensional array of 12 numbers length, myTwoDimArray, containing two-dimensional array (matrix) – 5х4 Integers, and myThreeDimArray - a field containing three-dimensional array 2х10х8. It should be marked, that under such definition array elements are numbered beginning from “one”, i.e. the first element has number 1, the second - number 2, etc. If someone wants to set the boundaries of the array on his own, for example from 0 up to 5, he should specify a field definition in the following way:

myArray INTEGER[0:5]

Arrays are realized on the basis of fields of BLOB type, therefore you should not be afraid that the multidimensional array "will pollute" your table with a large number of data: InterBase will neatly allocate the data of the array on separate pages to optimize input-output operations in these fields. How to use arrays? They provide with a convenient mechanism for storing objects of the same type. However in 80 % of cases instead of arrays developers prefer to store multiple data in subordinate (detail) tables, therefore arrays are not so frequently used in client applications of InterBase DBMS. This happens because the access libraries supplied with Delphi and C ++ Builder such as BDE and IBX are unable to work with arrays. According to documentation on InterBase it is possible to work with arrays using a preprocessor gpre, however this is not the most convenient way for Delphi/C ++ Builder developer. Fortunately, in FIBPlus library there is a support of fields-arrays in InterBase. You can read about it in detail in the chapter "FIBPlus special abilities". The client library IBProvider that allows to create client applications for InterBase using development tools of Microsoft company also supports the work with arrays (see the chapter " Development of client applications of InterBase DBMS using the technology of Microsoft OLE DB " (part 3)).

Conclusion

It should be marked, that it is impossible to tell about data types, not running forward, because they penetrate into all key areas connected with the development of database applications. Therefore, when reading this book it is better to use the given chapter as the quick reference you can refer to every time you need to refresh the basis of InterBase.