SQL Database Design » 10. Constraints » 10.4 Check Constraints

10.4 Check Constraints

Table of Contents

Check constraints are a powerful tool to ensure data consistency. They constrain values for one or a set of columns to be within their allowed ranges. Check constraints can be used to make sure that values meet certain conditions, e.g. "greater than ten" or within a defined range of predefined values. The general syntax for check constraints is:

CHECK (<condition>)

Note, that <condition> must be a boolean SQL construct, which by the way is another good reason for all DBMSs to support BOOLEANs.

As with every constraint, checks can be specified inline for single-column and out-of-line for single-column and multi-column checks. Of course, there can be several check constraints per table and per column:

CREATE TABLE OrderLines
(
  ...
  order_quantity  INTEGER NOT NULL CHECK (order_quantity > 0), -- single-column, inline check
  supply_quantity INTEGER NOT NULL,
  ...
  CHECK (supply_quantity <= order_quantity) -- multi-column, out-of-line check
  ...
)

The most popular check constraint probably is to constrain a column to a number of predefined strings:

CREATE TABLE Languages
(
  iso_639_code CHAR(2) NOT NULL PRIMARY KEY CHECK (iso_639_code IN ('en', 'de', 'fr')),
  ...
)

When specifying check constraints, the condition must contain the column name/s, even though the standard SQL VALUE keyword could theoretically be used for inline constraints. Currently VALUE's only use is for domains with check constraints or any equivalent, proprietary domain-like declarations. This makes sense, because when declaring a domain the column name is not known yet.

Forcing the checks into a separate (lookup) table makes them reusable. The DBMS ensures the data integrity via foreign key checks. The same holds true for enumerated types which are discussed in section 8.3 Enumerated Types. Please also refer to section 12.1 Lookup Tables and 12.1 Sequence Tables for reusable, table-based alternatives.

10.4.1 Foreign Key Type Propagation

While referencing and referenced columns involved in a foreign key should have the same type, a check constraint does not require a referencing foreign key column to repeat the constraint (because it is not part of the type). These declarations are perfectly valid:

CREATE TABLE Languages
(
  iso_639_code CHAR(2) NOT NULL PRIMARY KEY CHECK (iso_639_code IN ('en', 'de', 'fr')),
  ...
)

CREATE TABLE Pages
(
  ...
  language_code CHAR(2) NOT NULL REFERENCES Languages (iso_639_code), -- no IN check needed
  ...
  PRIMARY KEY (name, language_code)
  ...
)

The check constraint is not needed on the referencing column because the foreign key constraint will fail if the check fails on the referenced table.

10.4.2 MySQL ENUM Propagation Workaround

MySQL does not implement check constraints. The parser successfully reads, but ignores all CHECK declarations. Your only option to restrict a string type to a number of predefined values is to us an ENUM:

CREATE TABLE Languages
(
  iso_639_code ENUM('en', 'de', 'fr') NOT NULL PRIMARY KEY,
  ...
)

While this might be trivial for many, a MySQL ENUM has a serious and not so obvious downside: because the predefined values are part of the type, the enum declaration must be repeated for every foreign key column that references it. This is often the case, especially when using natural (multi-column) keys. This inherent property of MySQL ENUMs has the consequence of duplicating the values list across all subsequent tables:

CREATE TABLE Languages
(
  iso_639_code ENUM('en', 'de', 'fr') NOT NULL PRIMARY KEY,
  ...
)

CREATE TABLE Pages
(
  name          VARCHAR(20)            NOT NULL,
  language_code ENUM('en', 'de', 'fr') NOT NULL REFERENCES Languages (iso_639_code),
  ...
  PRIMARY KEY (name, language_code)
)

CREATE TABLE Sections
(
  name          VARCHAR(20)            NOT NULL,
  language_code ENUM('en', 'de', 'fr') NOT NULL,
  ordinal_nbr   INTEGER                NOT NULL,
  ...
  FOREIGN KEY (name, language_code) REFERENCES Pages (name, language_code),
  PRIMARY KEY (name, language_code, ordinal_nbr)
)

The problem here is maintainability. If you change the ENUM in one place, you must manually change all other occurrences as well. This requires an expensive ALTER TABLE statement (which recreates the whole table as described in section 8.3 Enumerated Types.

However, there is a little-known workaround to the ENUM type propagation problem: MySQL allows you to use a string type on the foreign key column that references the ENUM column:

CREATE TABLE Languages
(
  iso_639_code ENUM('en', 'de', 'fr') NOT NULL PRIMARY KEY,
  ...
)

CREATE TABLE Pages
(
  name          VARCHAR(20) NOT NULL,
  language_code CHAR(2)     NOT NULL REFERENCES Languages (iso_639_code), -- changed to CHAR(2)
  ...
  PRIMARY KEY (name, language_code)
)

CREATE TABLE Sections
(
  name          VARCHAR(20) NOT NULL,
  language_code CHAR(2)     NOT NULL, -- changed to CHAR(2)
  ordinal_nbr   INTEGER     NOT NULL,
  ...
  FOREIGN KEY (name, language_code) REFERENCES Pages (name, language_code),
  PRIMARY KEY (name, language_code, ordinal_nbr)
)

Note, the above CMS page example serves demonstration purposes only.

10.4.3 Maximum List Size

I am not sure whether there is a practical upper limit for the number of elements for IN checks (or ENUMs), but I hear people say it is around ten (or even less). The question is: what is the upper limit for the number of values in the set/list if it exists, and why is there no such thing for separate tables, which are basically unrestricted?

Overall, it does not make much sense to me to discuss a maximum for IN checks (and ENUMs) while not applying the same argument to tables. The code might get a little bloated but issuing a massive amount of inserts appears to be no better (other than shifting the responsibility from DDL to DML).

What does make some sense is the maintainability and performance of large IN checks (and ENUMs). Large lists might get bulky at some point, but from what I read, it appears to be a non-factor. Let me cite an answer I got from an email conversatoin with Joe Celko on the question whether IN check constraints have a reasonable maximum size:

My criteria has been how volatile the data is. For example, the 54 two-letter state codes are not going to change in my lifetime unless Texas leaves the Union or Newfoundland joins us. I would freeze that data in an IN() list. On the other hand, the branch bank codes for Woodforest Bank are growing at 50-100 per year (they are the bank inside Wal-Marts). I would put that into a small table.

The best answer is that it depends on your SQL. High end SQLs can handle lists with up to 32K elements, while smaller engines choke a[t] 1K [elements] or smaller. I believe it is DB2 that also optimizes the list when it gets to a certain size. When you are one element short of the "magic number", you get a linear search, but go over that number and it is compiled into a binary tree or something for searching. ...

There is no reason to fear a long list in most SQLs. This is a data language and it is good at large volumes. ...

I think this is more than a reasonable rule of thumb. Use it. Overall, performance does not seem to be much of an issue here.

References for Check Constraints:

Last updated: 2010-08-04