7.5 Binary Types

Binary types were created to store binary data directly inside the database. The traditional standard SQL types for binary data are BINARY LARGE OBJECT or short BLOB. BINARY was added in SQL:2003 and VARBINARY in SQL:2008. They are the binary equivalents of strings. BINARY is the equivalent of CHARACTER, and not CHAR. Strictly speaking, the SQL standard is missing its abbreviated cousins BIN and VARBIN. While this might not disturb you, it demonstrates inconsistent naming. I doubt it will ever be repaired.

The following lists all the standard SQL keyword combinations for character and binary data types:

Character and Binary Data Type Naming Mismatches
Character DataBinary Data

Note, the (binary) large object types are always variable-length and and expect a length parameter in the form BLOB(n[K|M|G]). Please refer to 7.1 String Types section for an overview of the optional multipliers kilo, mega, etc.

As far as I know, all DBMSs I have examined implement BLOB, BINARY, and VARBINARY (in all possible ways). Functionally, VARBINARY should be equal to BLOB, but BINARY should not be. The variable binary types are always at the length they were inserted. Shorter BINARY types must be padded, like CHAR. Usually, DBMSs pad BINARY with the zero byte, but that is not for sure. You have to check your DBMSs manual for the actual behavior.

Avoid types like TINYBLOB, MEDIUMBLOB, or LONGBLOB, they are all DBMS-specific. Standard SQL used to define the BIT type, which was retired with SQL:2003, probably due to confusion whether it is a number of flags, a boolean, or a general-purpose binary type.

One of the most frequent modeling questions about BLOBs is, whether it is better to store the binary data (e.g. images) in the database itself or to store a VARCHAR path to the binary file only. There is no general answer to this. If the answer was to use file paths at all times, BLOBs probably would not exist. After all, both approaches work.

However, unlike any other data type, handling binary data is not trivial. In standard SQL (and many other DBMSs), the syntax to specify bit data is B'100101' or X'03FB'. According to the standard, bit data is not necessarily stored in numeric or even binary format. Some DBMSs may allow you to specify a length, like BIT(n). You have to check your DBMSs manual for the length parameter and its default value, if available (SQL Server defaults to 1).

Even though you can specify binary data in SQL code, it is unusual to do so. A statement like INSERT INTO TableName (..., blob_col, ...) VALUES (..., X'CAFEBABE', ...) will not be of much use for large amounts of data. You have to use some DBMS-specific functionality or write some application code to open and read each file and put the bytes into the INSERT INTO statement. The data then resides inside the database as a copy and is accessible only via SQL. You might not want or need that copy if you need the binary files on the server anyway, e.g. images. If you need the data in other non-SQL ways, too, do not use binary types. There are probably more issues that I can only guess about.

Other downsides are the database size and backups. If the BLOBs contain non-critical data, your database contains a lot of "binary garbage" which are also duplicated with each backup. If you decide for the "file path approach", you are going to have much less overhead. Note though, when storing file paths, your database server will technically be a file server, too. Keep in mind, deleting and/or moving files might result in wrong file paths and/or dead links. As long as you can handle that, I would go with the file path solution for about 90 percent of the time.

References for Binary Types:

Last updated: 2010-08-04