SQL Database Design » 12. Auxiliary Tables » 12.1 Lookup Tables

12.1 Lookup Tables

Table of Contents

Lookup tables serve the purpose to look up (duh) a predefined number of values (one row per value or entity). Lookup tables are for nominally-scaled attributes. For ordinally-scaled attributes, refer to sequence tables in section 12.2 Sequence Tables.

The term nominal scale stems from calling something by name. Thus, a lookup table's primary key will most likely be some kind of string type. The general structure of a lookup table is:

CREATE TABLE Languages
(
  name VARCHAR(20) PRIMARY KEY [ CHECK (name IN('en', 'de', 'fr')) ],
  [...]
)

The name column represents the nominally-scaled lookup value. The bracketed check constraint addresses the constraint repetition problem, described later. The bracketed ellipsis represents optional, additional columns. There are two basics variants for lookup tables:

  1. Constraint lookup tables
  2. Real lookup tables

I created the term constraint lookup tables on my own. It is the best I could come up with. Such a table practically replaces a check constraint (by having only a single column). Real lookup tables map at least one value to their keys, making them real lookup tables.

Note, that I have created a special section on whether or not to put IN checks or ENUMs onto a constraint table's primary key in 12.1.3 Constraint Repetition. This discussion applies to all lookup tables. You should read through next two sections though.

12.1.1 Constraint Lookup Tables

In a single-column lookup table scenario, the table effectively becomes a reusable check constraint using the IN operator (or an unsorted MySQL ENUM replacement). Note the reusability emphasis. While a single-column lookup table is not really a lookup table (because there is nothing to look up), it does have its uses.

Every time reusability of predefined values is needed, your only real option is to use a separate table. With a lookup table the predefined values are moved into a centralized place. Any tables referencing the primary key column (the predefined lookup values) will delegate the integrity check to the set foreign keys.

12.1.2 Real Lookup Tables

A real lookup tables is basically an extension to the constraint lookup table. However, real lookup tables require additional columns to just the primary key column/s. This makes sense, as you need them to have a lookup at all. There is a lot to write about real lookup tables, but the main point is, that the same implications apply for real lookup tables and constraint lookup tables.

I took the common lookup table variants from the book Joe Celko's SQL for Smarties: Advanced SQL Programming, p. 487ff.:

  1. Simple Translation Auxiliary Tables
  2. Multiple Translation Auxiliary Tables
  3. Multiple Parameter Auxiliary Tables
  4. Range Auxiliary Tables
  5. Hierarchical Auxiliary Tables
  6. One True Lookup Table

In the current version of this paper, this section is a to-do. Please refer to Joe Celko's SQL for Smarties: Advanced SQL Programming for more information.

12.1.3 Constraint Repetition

Does it make any sense to repeat the IN check constraint (or ENUM) for the primary key of a lookup table?

The table's rows already replace the constraint values, putting an IN check constraint or ENUM on the primary key. While this is duplicate information and thus redundant, it can have uses. Technically, you just put a constraint onto a column which already represents the constraint. Lookup tables are basically unrestricted. While they allow you to enter any values as primary keys, so do IN and ENUM lists.

You cannot generally answer yes or no. In can make sense in specific situations, e.g. ISO country codes. A column type of CHAR(2) represents all possible country code permutations. An IN check constraint would represent all valid country codes (here non-white cells). The actual rows in a lookup table then represent what the database and/or application currently supports. As you can see, using a constraint this way can definitely make sense. Even adding another column to represent the different states (green, cyan, yellow, gray, red, magenta) can be handled easily.

However, there are arguments for repeating the constraint in a lookup table. Having the constraint on the primary key column serves documentary purposes. Other developers will immediately understand that your table is intended to represent a set/list of predefined values – no matter how many additional columns the table has.

And what about performance? Does the added check decrease performance? I have not done any performance testing, but an additional constraint will at least not make the table faster. However, the constraint on the primary key should only be checked by the DBMS when inserting new rows or when updating them, so this will not occurr very often. Please also read section 10.4.3 Maximum List Size.

Even though it is technically not necessary, I usually add constraints to my lookup tables. If you are still undecided, try adding the constraint and if it does not work out for you, revert to "plain" lookup tables. I have to admit, that I have not come to a final decision on this issue. If you think it is worth a discussion, feel free to contact me.

Last updated: 2010-08-04