7.2 Numeric Types

Table of Contents

Numeric types is a collective term for integral (whole) and fractional decimal numbers. A decimal, as used all around the world, is derived from lat. decimus, which means one tenth. The decimal system actually stems from (or is) the Hindu-Arabic number system. By this, we mean base 10 numbers, which can include fractional parts or not.

The SQL standard defines DECIMAL (plus its shorthand DEC) and NUMERIC to create integral and (precise, fixed-point) fractional numbers: The difference between DECIMAL(s,p) and NUMERIC(s,p) is subtle in the SQL-92 Standard -- DECIMAL(s,p) must be exactly as precise as declared, while NUMERIC(s,p) must be at least as precise as declared. (Note, that the original author mixed up the order of parameters.) On PostgreSQL, MySQL, Oracle, and SQL Server DECIMAL and NUMERIC behave identically.

DECIMAL and NUMERIC expect two parameters, precision and scale: DECIMAL(p,s) and NUMERIC(p,s). The precision represents the total number of decimal digits a number has. The scale represents the number of fractional digits (to the right of the decimal point). For example, the decimal number 543.21 has a precision of five and a scale of two. Note, the minus sign does not count and is usually stored in an extra byte.

The PostgreSQL manual writes the following about numeric types:

If the scale of a value to be stored is greater than the declared scale of the column, the ... [value will be rounded] to the specified number of fractional digits. Then, if the number of digits to the left of the decimal point exceeds the declared precision minus the declared scale, an error is raised.

Numeric values are physically stored without any extra leading or trailing zeroes. Thus, the declared precision and scale of a column are maximums, not fixed allocations. ...

Numeric types can be systemized as following:

Numeric Types
Systematization of Numeric Types

The classic way to store unlimited-length decimals are binary-coded decimals (BCDs). One decimal is stored in a nibble, which is a half-byte (four bits). Even though four bits can store 16 distinct values, only ten are used and then the next nibble is taken: Its main virtue is that it allows easy conversion to decimal digits for printing or display, and allows faster decimal calculations. Its drawbacks are a small increase in the complexity of circuits needed to implement mathematical operations. Uncompressed BCD is also a relatively inefficient encoding - it occupies more space than a purely binary representation.

The exact storage requirements for numeric types are DBMS-dependent. You have to check your DBMSs manual. As an example, the MySQL manual states the following: As of MySQL 5.0.3, values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the "leftover" digits require some fraction of four bytes.

Oracle uses a non-standard general-purpose numeric NUMBER keyword. Oracle has only basic SQL compliance (SQL-92) and implements all standard SQL numeric data types as aliases for the proprietary NUMBER. It translates all integral and fractional numbers to NUMBER, including the approximate fractional types.

On Oracle, a standard SQL SMALLINT unexpectedly becomes a (non-portable) NUMBER(38). Even FLOAT becomes a NUMBER internally, though the meta data still shows FLOAT. You might have a hard time porting an Oracle database, because it sometimes destroys the original type information. Because of this widely-used practice of aliasing standard SQL keywords, you generally shouldn't rely on what you expect without checking - on any DBMS.

Oracle is a notable exception with regards to storing numbers: it implements a special variable-length binary-coded decimal (NUMBER) which determines storage requirements per value and not per column type. This is an absolute exception around. The number of bytes Oracle requires for a number is always variable and depends on the specific value being inserted. You can see how much space is actually being consumed using the VSIZE function.

For example, Oracle occupies one byte for zeros, two bytes for values ranging from one to 100 etc. It's actually quite complicated, but pretty nifty if you ask me. You can read more about NUMBER internals here. While this is the "crown king" of storage optimization, calculating storage requirements up front can only be done by loading representative data and measuring the result: ... Since the size depends on the actual values, not the length of those values, predicting storage space is rather difficult. It's generally easiest to load some representative data and measure than to try to predict the space required ... .

However, you should never overemphasize storage optimization during design phase. It's not that much of an issue today as it once used to be. If storage optimization is a requirement of the design nontheless, you must find out how your DBMS behaves. The exact storage requirements of DECIMAL and NUMERIC aren't always obvious. If you need integral values ranging from -128 to 127 you would have to use a DECIMAL(3). However, specifying DECIMAL(3) would allow values ranging from -999 up to 999, which can only be guaranteed by using two bytes of storage with regular BCD implementations. If you need to further limit the range of values, use check constraints.

Some developers consider the absence of natural positive integers a standard SQL flaw. The SQL standard doesn't define unsigned types nor the UNSIGNED keyword. Should your DBMS support them, they are just DBMS-specific extensions. The only portable way to create unsigned types, is to use a check constraint in the form ... CHECK (VALUE >= 0). Constrain columns to unsigned only where appropriate (weights, heights, ages, quantities, etc.). On MySQL, the UNSIGNED keyword is also allowed with any numeric type (including fractional numbers). Note, that constrained unsigned types will cut the range of signed values in half, while real unsigned integers allow the double amount of values.

There are some more "modifier" keywords that can be used in conjunction with numeric types, like ZEROFILL, AUTO_INCREMENT (both MySQL only), or SERIAL (PostgreSQL and MySQL). Try to avoid them, they're all DBMS-specific. For more information, also refer to the 9. Default Values section.

The SQL standard includes several predefined integral and fractional types. There is usually no need to explicitly define integral numbers via DECIMAL or NUMERIC. For most designs, the built-in types are sufficient. Please refer to the following sections for the storage requirements of integral and fractional number types.

7.2.1 Integral Numbers

The SQL standard includes several keywords to specify integral types. You can either use DECIMAL or NUMERIC by passing a scale of zero to them or by omitting the scale argument entirely (the scale should default to zero according to the SQL standard). If DECIMAL or NUMERIC are used without arguments, the precision will usually default to the maximum the DBMS allows.

Alternatively, the SQL standard includes the keywords INTEGER, its abbreviation INT, SMALLINT (all SQL-92), and BIGINT (since SQL:2003). INT isn't a reserved keyword on every DBMS, e.g. Oracle (non-reserved). Thus, using INTEGER is slightly better than INT. Furthermore, INTEGER fits all other types better in terms of character length, e.g. when used in a CREATE TABLE statement. SMALLINT is available on every DBMS I examined, while SQL:2003 BIGINT can't be found in Oracle. Using SMALLINT, INTEGER, and BIGINT seem to be fine. Just try to stay away from DBMS-specific types like MEDIUMINT, INT1, INT2, INT3 etc.

When I started out learning about DBMSs and SQL data types, I was astonished, that the SQL standard doesn't include a TINYINT data type. While the SQL standard is supposed to specify only conceptual types, it does include SMALLINT and BIGINT, but misses TINYINT. It's contradictory.

However, this has to do with history: back in the 1980's and 1990's computers were at max 16-bit machines, so this was the size of an integer as well. Since the SQL standard has never defined the ranges of SMALLINT, INTEGER etc. and merely defined, that a SMALLINT had to be smaller than an INTEGER, a SMALLINT could only have been 8 bits in size. When the move was made to 32-bit machines a SMALLINT grew to 16 bit on most DBMSs. In my opinion, the SQL should either define the sizes of all integer types or abolish different keywords.

These described relics are indeed problems that prevail until today, as we're already in the 64-plus-bit age. As with every integral type, don't assume that they will have defined, constant sizes. Remember, the storage requirement problem won't matter today as it did in the past, as storage space is available in larger amounts and has actually become very cheap.

Getting back to TINYINT, many people have never really felt the need for a one-byte integer at all. I myself had hoped for a standard SQL one-byte integer for completeness, given that most DBMSs use two bytes for SMALLINT, four bytes for INTEGER, and eight bytes for BIGINT today. A one-byte TINYINT would fit into the overall picture of the standard SQL types and would complement today's programming languages (and object-relational mapping tools).

Because PostgreSQL closely follows the SQL standard, it doesn't have a one-byte integral type. Oracle has the BYTE keyword, but this isn't a data type, but just a string length type modifier. You have two possibilities to work around the "one-byte defect" on DBMSs that don't have one: if portability is more important than storage optimization you might want to fall back to the (usually two-byte) SMALLINT. If storage optimization absolutely paramount you might fallback to a CHAR(1) "hack" (PostgreSQL, Oracle), if you and your DBMS can make sure a CHAR only occupies exactly one byte.

On PostgreSQL, it's possible to declare a domain for a "one-byte integer" by encapsulating a CHAR(1) along with a one-byte character set, like:

CREATE DOMAIN tinyint AS CHAR(1)
  CHECK (VALUE BETWEEN -128 AND 127)

Note, the above is unconfirmed. Oracle doesn't support domains or an equivalent, proprietary syntax for domain-like types. So, to create a sure-fire one-byte integer on Oracle, the only way I see is to repeat a CHAR(1 BYTE) type declaration where needed:

CREATE TABLE Stats
(
  ...
  fouls  CHAR(1 BYTE) CHECK (fouls BETWEEN 0 AND 10),
  points CHAR(1 BYTE) CHECK (points BETWEEN 0 AND 100),
  ...
)

As far as I can see, the DBMSs automatic type cast facilities should allow calculations on such a type to succeed. (It's unconfirmed on my part, but I read about people doing this.) After all, optimzing bytes might not be such an issue on Oracle, because it stores small numbers very effectively.

As a last issue, some DBMSs allow you to specify lengths for integers, e.g. INTEGER(11)). Avoid them. This is a MySQL-specific extension, where the number is only a visual space-fill length to right-align the output, e.g. in a command line client. A second use could be to read that length from the database's meta data, so that user interfaces can use the length to restrict text field input. Anyway, for the sake of portability, specify integers without a length. Note, that MySQL will save sizes in its physical representations even if your code never said to do so. In PostgreSQL, Oracle, and SQL Server specifying an integer with length results in a syntax error.

References for Integral Numbers:

7.2.2 Fractional Numbers

There are two types of fractional numbers in SQL: precise and approximate. Precise fractional numbers are specified via the general-purpose numeric types DECIMAL and NUMERIC. Approximate fractional numbers are usually equivalent to regular programming languages' (IEEE 754) floating-point numbers. In this section, I focus on the two-argument (non-zero scale) DECIMAL(p,s) and NUMERIC(p,s) and any of their aliases.

7.2.2.1 Precise Fractional Numbers

For precise, fixed-point fractional numbers, the general-purpose DECIMAL and NUMERIC keywords must be used. You define precise fractional numbers by passing a scale larger than zero to them. PostgreSQL, MySQL, Oracle, and SQL Server all implement both DECIMAL and NUMERIC. Again, on Oracle, these types translate to proprietary, non-standard NUMBER internally. The storage requirements for DECIMAL and NUMERIC were described in the general section about numeric types.

Some DBMSs offer special types, like MONEY (PostgreSQL, SQL Server) or CURRENCY. They are usually just aliases for DECIMAL with a scale of two. Stay away from these, even though they have some nice extensions, like showing currencies in a command-line client.

7.2.2.2 Approximate Fractional Numbers

Even though programmers work with approximate, floating-point numbers regularly, how do they differ from precise ones and why does the decimal point "float"? As a computer science refresher, take this from Wikipedia:

The term floating point refers to the fact that the radix point (decimal point, or, more commonly in computers, binary point) can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated separately in the internal representation, and floating-point representation can thus be thought of as a computer realization of scientific notation. Over the years, several different floating-point representations have been used in computers; however, for the last ten years the most commonly encountered representation is that defined by the IEEE 754 Standard.

Due to its encoding scheme, floating-point numbers can represent a wider range of values as compared to fixed-point numbers. This comes at the price of precision. Floating-point numbers are rounded ~50 percent of the time. However, for inexact calculations, this is usually okay, as rounding only affects the least significant bits. You can read more about floating-point logic in the references listed at the end of this section.

In standard SQL, approximate fractional numbers are represented by the types REAL and DOUBLE PRECISION (and FLOAT implicitly). Both types have no length parameter, as the precision and scale vary, with the decimal point floating "between the bits", depending on the actual value (or rather the most significant bit that is set). REAL is the single-precision floating-point type, such as float in C, C++, or Java. DOUBLE PRECISION is the double-precision floating-point type, with double being the C, C++, or Java equivalent. I have no idea why the double precision type wasn't just called DOUBLE. Many DBMSs allow declaring a column as DOUBLE only though. REAL and DOUBLE PRECISION are usually implemented according to the IEEE 754 standard, but this isn't required by the SQL standard.

It's recommended to use approximate types only if approximate precision is sufficient or if one-to-one compatibility with programming languages (IEEE 754) floating points is required (mostly okay). Using approximate types is usually alright for non-critical, inexact data. However, never make the ultimate mistake to decide for any of the approximate, floating-point types for monetary attributes. As Joe Celko put it: The moron that declared columns to be FLOAT in a commercial system is probably a C programmer who does not understand rounding errors.

FLOAT is an alternative way of specifying floating-point types. FLOAT(p) is an "implicating" type for either REAL or DOUBLE PRECISION. FLOAT(p) takes a parameter, which is a binary (not decimal) digit precision, according to which the resulting type will either be REAL or DOUBLE PRECISION. According to the SQL standard, a FLOAT(1) to FLOAT(24) will result in single-precision, a FLOAT(25) up to FLOAT(53) will result in a double-precision type. Any length values beyond the allowed range results in an error. In practice, DBMS implementations usually deviate from these limits:

PostgreSQL also supports the SQL-standard notations float and float(p) for specifying inexact numeric types. Here, p specifies the minimum acceptable precision in binary digits. PostgreSQL accepts float(1) to float(24) as selecting the real type, while float(25) to float(53) select double precision. Values of p outside the allowed range draw an error. float with no precision specified is taken to mean double precision.

Note: Prior to PostgreSQL 7.4, the precision in float(p) was taken to mean so many decimal digits. This has been corrected to match the SQL standard, which specifies that the precision is measured in binary digits. The assumption that real and double precision have exactly 24 and 53 bits in the mantissa respectively is correct for IEEE-standard floating point implementations. On non-IEEE platforms it may be off a little, but for simplicity the same ranges of p are used on all platforms.

Because PostgreSQL is very close to the SQL standard, citing their documentation seems appropriate as a standard SQL substitute. Other DBMSs have their own limits. You have to check your DBMSs manual. The given FLOAT ranges should work on most DBMSs though.

Some DBMSs, notably MySQL, have different implementations for the three floating-point keywords. FLOAT is the MySQL single-precision, DOUBLE [PRECISION] is the double-precision type, REAL is just a synonym for a double-precision floating-point. However, FLOAT(p) (with parameter) is still an alias for selectively determining single- or double-precision floating-point numbers.

References for Fractional Numbers:

Last updated: 2010-11-10