SQL Database Design » 1. Introduction

1. Introduction

The purpose of this paper is to aid IT professionals in designing vendor-independent databases, while answering practical questions concerning data types, keying topics, and the translation of higher-level concepts to SQL. This work is an attempt to close the gap between the practical and academic perspectives on relational databases.

Throughout this document, I frequently use the term database management system (DBMS). Because I only cover relational database management systems (RDBMS), I will use DBMS as a synonym for RDBMS. Since this work is a lot about SQL, I also refer to ISO/ANSI SQL as "standard SQL" or "the SQL standard", with specific versions being designated as "SQL-92", "SQL:1999", "SQL:2003", and "SQL:2008".

My examinations are based on the SQL standard and four of the most widely used DBMSs around, with two being open-source and the other two being commercial: PostgreSQL and MySQL, plus Oracle and SQL Server. Other DBMSs are rarely mentioned. Considering only four DBMSs isn't a problem, because most principles apply to other DBMSs as well. If I should ever extend the range of DBMSs, the next ones to be considered would be IBM DB2, Sybase ASE, HyperSQL (HSQLDB), and SQLite. More distant are Firebird, Teradata, and Mimer.

SQL is fundamentally flawed in many expert's opinion, but it's such a widely-used programming language that most developers can't get around it in practice. After all, SQL is not a religion, but a language that puts food on the table, so I'm just trying to get the best out of SQL and database design for our everyday work.

The relational model is the mathematical foundation of relational databases. However, the SQL standard committee put features into the standard that are inherently non-relational. My main goal is to present standard and non-standard SQL techniques so you can quickly make a decision to get your work done. There have been great debates on certain features, such as nulls, booleans, and others where the final decision whether to use a them or not is ultimately up to you.

I try to outbalance the pros and cons against each other rather than rant about SQL, in contrast to some of the experts. Whereever I can, I focus on picking the best approach considering the given context (DBMS, requirements, ...), but even then it's sometimes hard to make a clear recommendation. Note, this text uses object-oriented techniques in a relational environment, which old-schoolers strictly frown away from. Also not, that this text is constantly evolving and it's by no means finished - as with my own knowledge.

I begin this work by defining some of the most often (mis)used relational database terms and continue with a section explaining the three levels of database design. It's followed by a fairly quick discussion of modeling notations and paradigms used today. The fifth section concludes the introductory part by giving an overview of higher-level modeling concepts concerning relationships.

The main part starts by laying out the logical database modeling process. After that, I dig into the main SQL part, in which I intentionally cope with rather low-level topics, which experts usually wouldn't deal with in that level of details: standard data types, defaults, constraints, user-defined types, and indices.

At the heart of this work you'll find section 13, a section about practical design heuristics in which I apply general and notation-specific modeling concepts to SQL to form a complete, logical modeling process. I conclude this work with more advanced topics like normalization, advanced data structures, and multilingualism in databases.

Last updated: 2010-10-12