Monday, November 9, 2009

Intelligent Keys

I'm engaged in an internal debate about the use of intelligent versus surrogate keys. Typically when this issue arises, the debate centers around whether we want to use a key that is already present in the data (such as SSN in an employee table - this is an intelligent key, also known as a natural key), or if it's better to generate a new meaningless key (such as an auto-incrementing integer - this is a surrogate key).

Now the internal debate isn't over that issue per se - I fall on the side that favors the surrogate key creation. The real debate I'm in is whether it's okay to create an intelligent surrogate key. The most typical surrogate as mentioned previously is an auto-incrementing integer identity - every time a row is inserted into the table, a new key is created by adding one to the max value. These keys have zero business meaning - that's their advantage, that they are decoupled from the business data. However, there are situations where it makes sense to create this value intelligently. One example is creating a time dimension in a data warehouse, whereby the primary key consists of an integer in the form "YYYYMMDD". Microsoft favors this method (as I discovered in their training kit for exam 70-448). A big advantage to this approach is that if you create a clustered index on that intelligent surrogate key, all of your data will be sorted in date order (of course, if you insert into that table by earliest date first, it will also be in that order - unless you add an earlier time period at a later date).

No comments:

Post a Comment