Database Naming Conventions

Time for another naming convention. This time it’s something people care more about than Java packages, we’re talking about databases. Here are the rules, and the reasons behind them.

Use lowercase for everything

We’ve got 4 choices here:

  • Mixed case

    • Some servers are case sensitive, some are not. MySQL for example, is case-insensitive for column names, case-insensitive on Windows for table names, but case-sensitive on Linux for table names.
    • Error prone
  • No convention

    • Same reasons as mixed case
  • Upper case

    • SQL is easier to scan when the reserved words are uppercase. This is valuable when scanning log files looking for things like WHERE statements and JOINs.
    • MySQL will always dump table names on Windows in lowercase.
  • Lowercase

    • Works everywhere. Some servers, like Oracle, will appear to convert everything to uppercase, but it’s just case-insensitive and you can use lowercase.

Only use letters and underscores and numbers (sparingly)

  • Most servers support other characters, but there are no other characters which all the major servers support.
  • Numbers should be used as little as possible. Frequent use is typically a symptom of poor normalization.

    • address1, address2 is OK
  • Whitespace isn’t allowed on most servers, and when it is you have to quote or bracket everything, which gets messy.

Table and column should be short, but not abbreviated.

  • You’ve seen the same thing abbreviated every way possible, like cust_add, cus_addr, cs_ad, cust_addrs, why not just customer_address? We’re not writing code on 80 character terminals any more, and most people aren’t even writing SQL, so let’s keep it clear, OK?
  • 30 characters is considered the safe limit for portability, but give some serious thought before you go past 20.

Table names should be singular.

Yes, singular! Oh yes, I went there. I used to use plural names, because it’s more semantically accurate. After all if each record is a person, then a group of them would be people, right? Right, but who cares. SELECT * FROM person isn’t any less clear than people, especially if you’ve got a solid convention. You don’t use plurals when you’re declaring class names for a vector of generics do you? Also:

  • English plurals are crazy, and avoiding them is good.

    • user -> users
    • reply -> replies
    • address -> addresses
    • data -> data (unless its geographic, then it’s datum -> data)
  • Singular names means that your primary key can always be tablename_id, which reduces errors and time.

Double Underscores for Associative Tables.

You’ve got your person table, and your address table, and there’s a many-to-many between them. This table should be called address__person. Why? Well what if you have a legacy_customer table that also ties to address. Now you’ve got address__legacy_customer. A new developer can easily pick up this convention and will be able to break down the names accordingly. Remember, no matter what the Perl/Lisp/Ruby/etc guys say, clarity of code is judged by how someone reads it, not how they write it.

Component Names of Associative Tables in Alphabetical Order.

This rule is somewhat arbitrary, but still beneficial. There’s no good way to determine which goes first. Table size, “importance”, age, who knows what else, and those assessments may change over time. Or, you might find that your manager assigned the same task to two people, and now you’ve got an address__person and a person__address table co-existing peacefully, when you only need one. Everyone putting them in the same order makes reading and writing queries easier.

That’s all I’ve got for now, but I encourage you to offer your own, or even refute some of the ones above (with some reasoning, of course).