7.3 Booleans

Table of Contents

Booleans are a sad story in databases. Booleans are one of the most elementary types in computer programming and they are part of every WHERE clause, but they aren't widely supported by today's DBMSs. To quote Pascal: A fundamental data type that should be universally supported is the truth-valued type, whose values are 'true' and 'false'. Even processors use the notion of a boolean type in the form of an internal "zero flag".

The absence of a boolean type in SQL-92 and the majority of today's DBMSs is more than astonishing. The SQL standard includes BOOLEAN as a reserved keyword starting from SQL:1999, BOOL has never been in the standard and probably never will. To be precise, I have no idea why INTEGER's shorthand INT is in the SQL standard, but BOOL isn't (PostgreSQL and MySQL have corrected that on their own). After all, the abbreviations CHAR, VARCHAR, and FLOAT are in the standard, too.

7.3.1 Three-Valued Logic

A standard SQL BOOLEAN can take the values FALSE or TRUE (besides the general NULL). To my surprise, the SQL standard also defines an UNKNOWN keyword, which (how intriguing) represents the "unknown" boolean state. Here is what the SQL standard has to say about BOOLEAN:

The data type boolean comprises the distinct truth values True and False. Unless prohibited by a NOT NULL constraint, the boolean data type also supports the truth value Unknown as the null value. This specification does not make a distinction between the null value of the boolean data type and the truth value Unknown ...; they may be used interchangeably to mean exactly the same thing. ...

All boolean values and SQL truth values are comparable and all are assignable to a site of type boolean. The value True is greater than the value False, and any comparison involving the null value or an Unknown truth value will return an Unknown result. The values True and False may be assigned to any site having a boolean data type; assignment of Unknown, or the null value, is subject to the nullability characteristic of the target.

So what does this mean exactly? I clearly interpret this as UNKNOWN must be supported, but it's exactly equivalent to NULL, making UNKNOWN redundant. Both values can only be set for columns that aren't declared as NOT NULL.

However, another source interprets that differently: The DBMS may interpret NULL as equivalent to UNKNOWN. It think this interpretation isn't correct. The latter source further states: It is unclear from the specification if the DBMS must support UNKNOWN, NULL or both as boolean literals. The problematic part of the SQL standard is, that any comparison involving the null value or an Unknown truth value will return an Unknown result. This implicitly means, that any implementation must provide the UNKNOWN keyword (besides, a result of UNKNOWN is only given for certain and not any comparison, as shown in the truth tables coming up).

But why did the standard SQL people define a new keyword for the boolean unknown state when it's treated like NULL anyway? People holding grudges against UNKNOWN have an argument. I don't really like it, too. There is no other type in SQL that has its own keyword for the "unknown" state: no string type, no numeric type, no temporal type, no binary type, nothing. For all these, NULL is used. That, in a way, makes booleans somewhat special in SQL.

In any case, the notion of an "unknown" boolean value effectively results in a three-valued boolean. If UNKNOWN didn't equal NULL, BOOLEAN would even be four-valued (for boolean columns that allow NULL). However, I don't see a problem omitting UNKNOWN entirely, because they are equal anyway. Even if UNKNOWN didn't exist, the three-valued logic would still apply and the truth tables for all logical operators AND, OR, NOT, and IS would still be the same (see below).

I copied the logical operators' truth tables directly from an SQL:2008 draft. They are not any different from the final SQL:1999 standard:

Truth Table for AND
AND TRUE FALSEUNKNOWN
TRUE TRUE FALSEUNKNOWN
FALSE FALSE FALSEFALSE
UNKNOWNUNKNOWNFALSEUNKNOWN

Truth Table for OR
OR TRUEFALSE UNKNOWN
TRUE TRUETRUE TRUE
FALSE TRUEFALSE UNKNOWN
UNKNOWNTRUEUNKNOWNUNKNOWN

Truth Table for NOT
NOT
TRUE FALSE
FALSE TRUE
UNKNOWN UNKNOWN

Note, that NOT UNKNOWN is UNKNOWN. (The opposite of UNKNOWN is either FALSE or TRUE, which is unknown.)

Truth Table for IS
IS TRUE FALSE UNKNOWN
TRUE TRUE FALSE FALSE
FALSE FALSE TRUE FALSE
UNKNOWN FALSE FALSE TRUE

Note, the Wikipedia article on 3VL is wrong (as of mid 2009), because it displays the equality operator's instead of the IS operator's truth table. If at all, you have to use the latter to check specific boolean values. The IS operator has a special property: it never returns UNKNOWN (and/or NULL), in contrast to AND, OR, and NOT.

7.3.2 Null Propagation

There is a general issue with SQL operators commonly referred to as "null propagation": in SQL, every expression evaluates to NULL whenever one operand is NULL. There are rather academic discussions around, which seem to be lead by Joe Celko, arguing, that the SQL BOOLEAN type is inconsistent and should be avoided, because UNKNOWN and NULL are not equal. To quote Celko:

The NULL cannot be treated as an UNKNOWN because one of the basic rules of NULLs is that they propagate. The resulting four-valued logic is inconsistent:

UNKNOWN AND TRUE = UNKNOWN
UNKNOWN AND FALSE = FALSE
NULL AND TRUE = NULL
NULL AND FALSE = NULL

Note, that I corrected the original order by switching the latter two.

While the first, second, and (now) third examples are correct, the fourth regularly gives me headaches. NULL AND FALSE always evaluates to FALSE, because no matter what the left operand would become, the overall result will always be FALSE. This is what you call a neutral element and has to do with predicate logic. At least, this I what I used to learn at my university. I am not saying Mr Celko didn't do his homework, but I simply don't see why UNKNOWN and NULL are supposed to be logically different.

Celko's argument, that any occurrence of NULL automatically produces NULL as a result in SQL hangs: if you use the SUM() function on a numeric column, it will count all NULLs as "zero" and won't produce NULL, if present.

I decided to pursue the issue a little further. Celko either isn't very elaborate on this issue or my mind is failing me. He leaves the reader puzzled about his reasons, because he actually gives no other explanation but to call this a "basic rule". Further reading resulted in another of Celko's books Joe Celko's SQL for Smarties stating:

SQL has three-valued logic: TRUE, FALSE and UNKNOWN. The UNKNOWN value results from using NULLs in comparisons and other predicates, but UNKNOWN is a logical value and not the same as a NULL, which is a data value marker. That is why you have to say (x IS [NOT] NULL) in SQL and not use (x = NULL) instead.

Celko's arguments beat me. You have to use the AND, OR, NOT, and IS operators with UNKNOWN and NULL in any case, so they are no different from FALSE and TRUE. From a logical perspective, I don't care whether a value isn't applicable, not known, not knowable, or whatever interpretation you can come up with. The resulting logic is, that you do not have this (neutral) piece of information and that predicate logic will decide for the outcome of an expression.

Many DBMS implementors seem to agree: UNKNOWN is often omitted entirely in favor of NULL or it is just a synonym for it. Pascal states:

But even if users adhere to this 3VL discipline, a[n] SQL DBMS will still generate NULLs for inapplicable values in derived tables, such as views and query results. An obvious case is the outer join operation and if input tables to it are views and contain NULLs for missing (unknown) values, the outer join will generate NULLs for both missing and inapplicable values.

In my opinion UNKNOWN is completely redundant: again, no other type has an "unknown" equivalent and they work, too. I still fail to understand why the UNKNOWN keyword exists at all and why NULL should propagate. Maybe it was to satisfy the academics who kept complaining.

7.3.3 My Own Interpretation

To continue with something more productive than just ranting, I'll try to give my own explanation: I firmly believe, that the "null propagation issue" can't be applied to the logical operators AND, OR, NOT, and IS (which is why they were written out as keywords to indicate a different group of operators). Null propagation only applies to all SQL mathematical (+, -, *, /, ...), comparison (=, <>, <, <=, >, >=, ...), and other operators (||, ...), including SQL constructs, like CASE, IN, ... which obviously use the equality operator.

The SQL standard has one misleading passage: any comparison involving the null value or an Unknown truth value will return an Unknown result should have been directed towards the logical operators only. This wouldn't make any sense otherwise, because it would undermine all boolean logic, as depicted in the truth tables of the SQL standard (and my professor's DB script).

For TRUE OR UNKNOWN (TRUE OR NULL), the fact that the value on the right is unknowable is irrelevant, because the outcome of the test would be TRUE in any case. For FALSE AND UNKNOWN (FALSE AND NULL), the result would always be FALSE. Those are two examples of the neutral element. Again, I have no idea why the standard SQL people decided to introduce the UNKNOWN keyword instead of just using NULL. Even if you replace UNKNOWN with NULL, the boolean logic remains the same.

Aside from the specification issues, isn't UNKNOWN what NULL is supposed to provide? Analogously, programming languages like Java also have Boolean classes. They don't define an "unknown" value, too, can become null, and they work nontheless.

I don't think the BOOLEAN type should be avoided. However, Fabian Pascal, who once had a personal conversation with C.J. Date about SQL boolean a while ago, stated: SQL/99 adds support for the truth-valued domain, but "gets it horribly wrong". I had no luck finding out what he actually meant.

The SQL standard has several inaccuracies and the UNKNOWN keyword hasn't been the best idea. But, does that make BOOLEAN entirely redundant, just because some experts say, the SQL standard messed it up? As I struggle to find any real reasons, I finally put the boolean topic into my "unsolved mysteries" drawer.

For me, the whole discussion about 3VL booleans are just too academic. Because the SQL standard has decided for 3VL booleans, there is nothing SQL developers can do about it. You can only decide to use it or not. After all, SQL also doesn't differentiate between inapplicable or unknown values for integers and only uses NULL to represent both, which seems to work, too. In any case, using pseudo-booleans instead of real ones, like strings with 'N' and 'Y' are no better.

I'm (still) a supporter of the BOOLEAN type, even if I consider the UNKNOWN keyword a misconception. As long as we can ignore the UNKNOWN keyword or consider it a synonym for NULL, don't we have that boolean we have ever wanted? I do. This guy does, too: You should forget about the UNKNOWN literal in order to simplify the situation and let the normal SQL three-way logic apply.

7.3.4 Using Booleans

MySQL and Oracle don't know the UNKNOWN keyword at all. In PostgreSQL and SQL Server, UNKNOWN is a non-reserved keyword being a synonym for NULL. Because of that, UNKNOWN should be rejected from being inserted into boolean columns that are declared as NOT NULL (unconfirmed).

Using BOOLEAN pretty much works like booleans from programming languages. In Java, for example, you would write an if-statement in the form:

if ( booleanVariable ) ...

It's not necessary to check the boolean value explicitly, like:

if ( booleanVariable == false ) ...
if ( booleanVariable == true ) ...

The SQL standard allows the equivalent logic in the WHERE clause:

WHERE boolean_column ...

Analogously, there is no need to write:

WHERE boolean_column IS FALSE ...
WHERE boolean_column IS TRUE ...

The only difference between the Java and SQL tests is, because the SQL boolean is three-valued, the former test will return only the rows that are TRUE. If you need the FALSE entries, use WHERE NOT boolean_column .... The question that arises is, what will happen with the WHERE boolean_column ... test if the value is NULL (or UNKNOWN). The answer is: it will never return any rows, unless the IS operator is used. While this is obvious for many, I found that many developers lack these kind of basics.

Everyone could just be using the BOOLEAN keyword, however, most DBMSs still haven't implemented it. Some do (PostgreSQL), some know BOOLEAN, but it's just an alias for a proprietary integer (MySQL), others don't know them at all (Oracle, SQL Server). Oracle allows BOOLEAN in PL/SQL only and will generate an error when used in regular SQL. SQL Server doesn't implement BOOLEAN, but you can use BIT instead, as discussed later.

In MySQL BOOLEAN (and BOOL) are translated to a TINYINT(1), effectively turning a portable, standard SQL type to an unportable, non-standard type. However, it's not that much of a problem. According to this source, TINYINT(1) is exclusively reserved as a real boolean in software implementations:

Never use TINYINT(1) as the column type if you use JDBC with MySQL and expect to get non-boolean values from it. ... If you use JDBC with MySQL, then BOOLEAN is the preferred type for booleans: MySQL's JDBC-driver implicitly converts between Java's boolean and MySQL's pseudo-BOOLEAN type.

(Ironically in MySQL, even though BOOLEAN and BOOL are not in the list of reserved keywords, TRUE and FALSE are.)

So, what do you do with booleans now? Because BOOL isn't even standard SQL, forget that I have ever mentioned it. BOOL is only understood by PostgreSQL and MySQL. The easiest way is to stick with BOOLEAN, if available. Even on MySQL, using it is fine. Just be aware that your DBMS may convert it to some other representation internally.

7.3.5 Simulating Booleans

For all DBMSs that don't implement the BOOLEAN keyword, you have to simulate booleans. You have several possibilities to do so. What you use depends on what your DBMS offers, and on how much portability you need. The classic way to simulate booleans is by using standard SQL domains, which let you specify custom, reusable data types. More on that later.

First, you have to decide for an appropriate data type. This can be a real pain, if you don't know the details. There is no uniform opinion in the database community and most people simply decide by their personal preference, unaware of the consequences their choice will have. You can basically use the following types:

  • An integer type (SMALLINT, INTEGER, TINYINT, ...)
  • A BIT
  • A string type (CHAR, VARCHAR, ...)
  • A two-fold ENUM('N', 'Y')

Whatever the choice for your boolean replacement type will be, there is a small problem: the columns can take any value that fit into their type. Thus, it is generally a good idea to use check constraints alongside. This helps to keep your data clean and consistent.

7.3.5.1 Integer Booleans

The natural way computers store boolean values has always been some kind of integer with zero meaning false and one (or non-zero) meaning true. This will probably never change. Thus, integers provide a straightforward way to store booleans. There is an analogy between SQL and low-level programming languages like C or C++ concerning integer booleans: C and C++ do not have a real boolean type, so you have to simulate booleans via integers. You can apply the same logic to SQL, if no real booleans are available. In C/C++ you would have to perform a logical check like this:

if ( intvar ) ...

Again, it's not necessary to check the boolean values explicitly, by writing

if ( intvar == 0 ) ...
if ( intvar == 1 ) ...

or

if ( intvar == FALSE ) ...
if ( intvar == TRUE ) ...

because, intvar automatically evaluates to false (if zero) or true (if non-zero). Hard to believe, but you often find such bad code in books, magazines, and especially on the Internet.

There is no logical difference in SQL. Depending on your DBMS, certain syntaxes are allowed, some aren't. MySQL, which implements the BOOLEAN keyword as an alias for a TINYINT(1), allows using integers like booleans:

WHERE integer_column

PostgreSQL, Oracle, and SQL Server all disallow the above syntax. Instead, you have to write

WHERE integer_column = 0 ...
WHERE integer_column = 1 ...

or whichever DBMS allows to mix integers with boolean literals, like SQL Server:

WHERE integer_column = FALSE ...
WHERE integer_column = TRUE ...

Note, that most DBMSs also allow != as inequality operator. Be aware, that SQL integer columns can generally become NULL. Should any integer column be NULL, any of the above checks will return no rows. You have to check NULLs by using the IS operator.

So what's the point of all this? While this doesn't have much to with database design directly, it reflects the consequences the wrong choice for booleans will have. Because of its naturalness, using an integral types as a boolean replacement is just perfect. The only downside I see is, that zeros and ones aren't very readable. However, integer booleans should be value-compatible with real booleans without translation. However, using an integer isn't the only way to simulate booleans.

7.3.5.2 Bit Booleans

My first thought was, that using the BIT type as a general boolean replacement should be alright, because the data (zeros and ones) should be compatible with real booleans. However, BIT (and BIT VARYING) were removed from the SQL standard in SQL:2003. Thus, using these types is no longer the best idea. I suspect the bit types were removed because they caused too much confusion about how they are supposed to be used. BIT was originally intended for bit fields, but many DBMSs (ab)used it for other purposes, like booleans.

Transact-SQL compatible DBMSs (Sybase and SQL Server) seem to be the unlucky parties here, as they have promoted using BIT as a boolean ever since (note that Sybase and Microsoft once worked together on Transact-SQL, thus the similarities between these DBMSs). Because SQL Server doesn't know the BOOLEAN type, it's recommended to use BIT as booleans. SQL Server allows mixing the BIT type with the keywords FALSE and TRUE: The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.

Using BIT as booleans is only a good idea on DBMS that actually recommend to do so, like SQL Server. BIT is SQL Server's (now non-standard) replacement for a standard SQL BOOLEAN. SQL Server allocates one additional byte for every multiple of eight bits, so one byte is used per boolean:

An integer data type that can take a value of 1, 0, or NULL. ... The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or less bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.

You should generally prefer BIT over intergral types on SQL Server because the BIT type is stored in the database's meta data and denotes boolean columns. Note though, that using the BIT type can cause other unforeseeable problems. Certain operations that you would expect to work cause syntax errors, like using aggregate functions (e.g. SUM) or GROUP BY on that BIT column. If you can live with(out) this, use BIT, otherwise you should fallback to an integer instead.

Since SQL Server doesn't support domains, but has its own syntax to provide domain-like types, I recommend to use the latter to specify a user-defined boolean type. There is an example in section 7.3.6 Boolean Domains.

7.3.5.3 String Booleans

The main driver of string booleans seem to be the Oracle folks. The most prominent comment is from Thomas Kyte, the former maintainer of the Ask Tom Oracle site, a long-term Oracle database administrator and author of many renowned books. He once answered a question, why Oracle does not have a boolean type with:

since

...,
flag char(1) check (flag in ( 'Y', 'N' )),
...,

serves the same purpose, requires the same amount of space and does the same thing - I guess we feel this is a feature we can let them have that we really don't need.

This is the original statement from his website - from 2002. To my relief, in 2008 he admitted that his original opinion on this issue was wrong. He now also recommends to use an integer instead. However, many old-school (Oracle) developers haven't adopted this practice yet. I found the following statement in another Oracle article from 2004:

The most commonly seen design is to imitate the many Boolean-like flags that Oracle's data dictionary views use, selecting 'Y' for true and 'N' for false.

Aha! Now that's the reason why the Oracle people used string booleans: the Oracle DBMS developers decided to leave booleans out, because they don't want to rewrite major parts of their database products. I wonder, with this attitude, how Oracle wants to be prepared for the future, if they can't even get basic things like booleans done. In any case, it doesn't shed good light on Oracle, given that I haven't heard any real reasons as to why not to support booleans. The referred article further states:

... However, to interact correctly with host environments, such as JDBC, OCCI, and other programming environments, it's better to select 0 for false and 1 for true so it can work correctly with the getBoolean and setBoolean functions.

You could define a Boolean as NUMBER(1); however, in Oracle's internal number format, 0 takes 1 byte and 1 takes 2 bytes after the length byte (so it's more efficient to store it as CHAR). Even though the character is defined as CHAR, SQL can convert and verify against actual numbers.

In contrast to that, the Oracle Database JDBC Developer's Guide states this:

Because there is no BOOLEAN database type, when you use getBoolean a data type conversion always occurs. The getBoolean method is supported only for numeric columns. When applied to these columns, getBoolean interprets any zero value as false and any other value as true. When applied to any other sort of column, getBoolean raises the exception java.lang.NumberFormatException.

Another Oracle source, the Oracle Database SQL Developer Supplementary Information for Microsoft Access Migrations writes: When SQL Developer retrieves Boolean data, the Microsoft Access ODBC driver returns one of two values, 0 as FALSE or 1 as TRUE, therefore, FALSE and TRUE values are also represented in Oracle as 0 and 1. String booleans on Oracle are a pretty two-edged sword: some people recommend them, some don't. It's symptomatic. Now let me get back to the technical stuff.

When implementing booleans as strings, your boolean data will be something like 'N' and 'Y' instead of the literals FALSE and TRUE (or zero and non-zero respectively). Longer strings could be 'Off' and 'On' or 'tested' and 'untested'. You can either choose between fixed-length or variable-length string types. Note, that you can't make sure that any string column can only take two values without a check constraint. Alternatively, you could use an (unportable) ENUM, described later.

The real problem I have with string booleans is, that any string column which is supposed to take only two possible values is not really a boolean, but rather a two-fold check-constrained type using the IN operator. You usually use these types to model nominally scaled attributes with a fixed number of predefined values (like genders, colors, etc.). As in the example above, 'N' and 'Y' represent two states, here no and yes, but not necessarily false and true. You would never say that a gender 'M' would mean true and the other gender 'F' would mean false (or vice versa). Conceptually, string booleans just aren't the best approach.

Still, many data architects use them, but only because they don't encode any semantics into their column names. Instead of naming a column "status" with the possible values being 'selected' and 'unselected', they could simply call the column unselected or is_selected and then encode false and true. As long as you don't plan for more than two values, the approach to use integers as booleans is superior.

Another negative issue about string booleans is case-sensitivity. Is it better to declare a column to contain 'N' and 'Y' or rather 'n' and 'y'? What about 'off' and 'on' versus 'Off' and 'On'? Case usually isn't that much of a problem, because the majority of running databases are using case-insensitive character sets.

String booleans don't lend themselves well to consistent SQL programming. One developer will type 'no' and 'yes' into their code, another will use 'No' and 'Yes', a third might use 'NO' and 'YES'. That raises many questions for any following developers, who will have to consult the maintainers, trying to find out the policy that was (not) agreed on.

Another downside is internationalization. With string types, one developer might always use English ('no' and 'yes'), another might always use German ('nein' and 'ja'). This not only causes confusion all over, but it can cost a lot. Hmmm, where is this example with the French, Maltese, and Swedish students working on the same project again...? In multilingual environments, you gain nothing by putting language-specific strings (mostly English 'N' and 'Y') into your databases. An integer boolean is always fully localized and free of language garbage. You don't have to deal with all these value, case-sensitivity, and language issues at all when using numeric data.

Many developers also implement string booleans, because you can put the values onto a graphical user interface (GUI) without any processing. This is a poor argument. User interface concerns should never affect database design. The GUI changes much more often than the database, so with the next GUI, the argument might be gone anyway. I have seen many databases having come a long way whose string booleans often ended up mixing values, case, and even language so that they needed data cleansing all over the SQL statements which used these columns.

The list of practical problems is still growing: sometimes you have to export data to text files, like comma-separated value (CSV) files. As stated before, boolean data is naturally stored as 0 and 1. You would always find the strings in the exported files instead. What if you want to change the two possible values someday? You would have to adjust every SELECT statement that uses these columns. If only one of the two changes, the data import will fail. Furthermore, if you ever want to switch to a real BOOLEAN someday, you always have to write some translation code to map one string to FALSE (or zero) and the other to TRUE (or one). You avoid all of these troubles with real, integer and bit booleans.

To complete the discussion about string booleans, I have to mention a boolean "trick" I have been reading about (see references). The idea here is to use a CHAR(0) column which can only take the empty string '' or NULL, with NULL meaning false and the empty string meaning true. On MySQL, a CHAR(0) has the side effect of occupying only one bit. This approach should be avoided though. It's a hard-to-understand hack and everything but transparent. Your DBMS might not allow CHAR(0) at all. DBMSs like Oracle make no difference between NULL and the empty string. Even if your DBMS allows you to create CHAR(0) columns, they probably won't give you that one bit storage side effect.

To finally address using string booleans, a comparison in the WHERE clause must always include the specific value or the exact opposite of false to mimic an analogous non-zero test:

WHERE stringn_column = 'No' ...
WHERE stringn_column = 'Yes' ...
WHERE string1_column = 'N' ...
WHERE string1_column = 'Y' ...
WHERE string0_column IS NULL ...
WHERE string0_column = '' ...

As with integers, any of these tests exclude NULLs. Note the use of IS NULL with the CHAR(0) trick to test for false. Using the equality operator for testing NULL is wrong. I think anyone still deciding for string booleans, hasn't really understood computer science basics.

If you really want to use string-based boolean, please declare them along with a check constraint and in 'N'-'Y' order and not the other way around (so that the false value is smaller than true). For string booleans in the form CHECK (... IN ('N', 'Y')) this doesn't really matter, because the IN operator uses set logic.

7.3.5.4 Enumerated Booleans

The last possible way to implement booleans are ENUMs. An enumerated type is very similar to a string type in functionality, it just adds an automatic check constraint and order. The same arguments against string booleans hold true for ENUMs as well. Additionally, ENUM isn't standard SQL. An ENUM certainly is the weakest and least desirable boolean of all. Don't use them, even if they are tempting as "self-explanatory" booleans for non-technicians or in graphical user interfaces.

7.3.6 Boolean Domains

There is one problem with the simulated booleans I've just described: they aren't reusable. This is what domains are for. Domains basically just wrap a data type, a default value, a collation, and a check constraint. You can read more about checks and domains in the 10. Constraints and 8. User-Defined Types sections.

The following is a portable example of a standard SQL boolean domain (note the use of SMALLINT as the smallest standard SQL integer):

CREATE DOMAIN boolean AS SMALLINT
  CHECK (VALUE IN (FALSE, TRUE))

Oracle and SQL Server don't support standard SQL domains. At least, SQL Server has its proprietary equivalent, so you can declare:

CREATE TYPE boolean FROM BIT -- or TINYINT
  CHECK (VALUE IN (FALSE, TRUE))

The missing BOOLEAN on Oracle is probably its most prominent flaw, even though many developers claim to have never missed one. Oracle, in contrast to SQL Server, doesn't even offer a value-compatible type as an alternative. Because Oracle neither has domains nor does it have a proprietary domain-like syntax, you have no other choice but to repeat the pseudo-boolean declarations for each column:

CREATE TABLE Teams
(
  ...
  was_withdrawn SMALLINT CHECK (was_withdrawn IN (0, 1))
  ...
)

CREATE TABLE Assignments
(
  ...
  was_absent SMALLINT CHECK (was_absent IN (0, 1))
  ...
)

Note, that you can't use FALSE/TRUE and the VALUE keyword in Oracle SQL. Because of Oracle's special BCDs, each integer boolean of this kind occupies approximately 1.5 bytes on (naive) average. If you really need a sure-fire one-byte boolean on Oracle you can use a CHAR(1 BYTE). The check constraint works perfectly. You cannot use a RAW(n) type instead, because the check constraint will fail on table creation.

Using boolean domains or an equivalent proprietary syntax instead of repetitive column declarations has several advantages:

  1. The use of a domain or a domain-like type makes it immediately obvious, that you mean a boolean and not a numeric type.
  2. If you ever feel like updating the boolean definition, you only have to do it once.
  3. The move to a BOOLEAN-aware DBMS can simply be accomplished by deleting the type. In the opposite direction, all you have to do is to create a domain or domain-like type named "boolean". (Note, that, in both cases, you might want to adjust the capitalization in your code to indicate a switch between an inbuilt and a user-defined data type.)

I hate to say it, but the missing BOOLEAN type plus the inability to specify a domain-like types is enough reason for me not to use Oracle in production. Oracle provides developers with a PL-only BOOLEAN, but it can cause confusion when using stored procedures that return a BOOLEAN with Oracle SQL.

7.3.7 Object-Relational Booleans

As an experiment, I tried to find other alternatives to domains. I looked at the object-relational types, which have somewhat officially been named to succeed domains. With the object-relational type syntax, you can create structured types which could encapsulate a single pseudo-boolean attribute. The standard SQL syntax would look something like:

CREATE VALUE TYPE boolean AS
(
  boolean_col SMALLINT CHECK (VALUE IN (0, 1)) -- or ... IN (FALSE, TRUE)
)

Because this is just a one-attribute structured type, you would have to use it like an structured type. When inserting data, you would have to use the type with a constructor, like

INSERT INTO Teams (..., was_withdrawn, ...) VALUES (..., boolean(0), ...)

instead of just

INSERT INTO Teams (..., was_withdrawn, ...) VALUES (... , 0, ...)

When querying that structured boolean via a SELECT statement, you would have to use the dot operator to dereference the boolean attribute:

SELECT ... FROM Teams WHERE was_withdrawn.boolean_col <> 0

While this works, it's anything but elegant. I don't recommend these kind of booleans, also because, strictly speaking, the use of structured types destroys the first normal form. I would rather fallback to the (not really) deprecated domains or repeat the boolean declarations per column.

7.3.8 Summary

To sum up booleans in databases: BOOLEAN is a standard SQL:1999 reserved keyword. If it's implemented, use it, even if it's just an alias for an unconstrained integer (like TINYINT(1) in MySQL). I don't buy the argument, that booleans have no right to be used, just because of the UNKNOWN keyword and 3VL. Just consider the UNKNOWN keyword as a synonym for NULL. After all, replacing a boolean with a simulated boolean type will give you the very same 3VL, because any type can basically become NULL.

SQL Server has no BOOLEAN keyword, but the former standard SQL BIT instead. I would recommend to use SQL Server proprietary domain-like syntax to create a custom boolean. Use that with either SMALLINT (portable), TINYINT (unportable), or BIT (no longer portable), ideally along with a check constraint.

Oracle is the "boolean black sheep" on the market. Because of its deficiencies you can't avoid repeating the declaration for each pseudo-boolean column. Oracle has its per-value storage scheme, so you should use some integer or CHAR(1 BYTE). I recommend to make it a SMALLINT if you don't need a one-byte boolean and use that along with a check constraint.

In any case, integer approaches are generally better than strings for simulated booleans. Avoid using non-standard ENUM as booleans. PostgreSQL, MySQL, and SQL Server all allow you to use the keywords FALSE and TRUE. Use them. On Oracle, these are only available for PL/SQL, but not regular SQL.

To conclude a rather lengthy discussion about booleans, here is an overview of the resulting physical types that are stored in the database's meta data:

Effective Boolean Meta Data Types per DBMS
DBMS(Best) Boolean Meta Type
PostgreSQLBOOLEAN
MySQLTINYINT(1)
OracleNUMBER
SQL ServerBIT[(1)]

Booleans are much more complicated in databases than they had to be. A lot of fuss could be avoided if all DBMS vendors implemented the standard SQL BOOLEAN type (which was added to the standard way too late). Astonishingly, there are still a number of people who don't feel the need for a real boolean type. The question is: how natural does a concept have to be so that an SQL boolean will finally be acknowledged by the database community?

References for Booleans:

Last updated: 2010-11-10