Showing posts with label DW. Show all posts
Showing posts with label DW. Show all posts

Thursday, December 3, 2009

Kimball Design Tip #119 Updating the Date Dimension

I've always thought of the Time or Date dimension to be a static one, rather than slowly-changing, until I read this tip from Kimball U: "...there are some attributes you may add to the basic date dimension that will change over time. These include indicators such as IsCurrentDay, IsCurrentMonth, IsPriorDay, IsPriorMonth, and so on. IsCurrentDay obviously must be updated each day."

Creating and automatically updating these attributes of the Date dimension will save time (sorry, no pun intended), for report writers and data mart developers, and also standardize such calculations across the enterprise. For example, let's say you are responsible for the data warehouse of a catalog company, and you want to calculate lag time between ordering and shipping, a common metric of customer satisfaction. Furthermore, management decides to give everyone an extra holiday as a bonus for company profitability from last year, and so lets everyone take off Groundhog Day. To account for this in your metric, you are going to want to have an attribute like "BusinessDayOfYear" as an incrementing integer, so that the first business day in January gets assigned BusinessDayOfYear=1, the second day is 2, and so forth, until you have the highest number as the last business day in December (which is most likely Dec 30th). If the company now has Groundhog Day as a holiday, then the IsHoliday for Feb 2nd is changed from 0 to 1, and then the BusinessDayOfYear attribute is recalculated.

Calculating how many business days between ordering and shipping is then trivial. Of course, this does not account for orders placed in December that ship in January, so you might want to have an attribute BusinessDayLifetime, which starts at 1 the first day the company was in business, and increments freely forever.

Monday, November 23, 2009

Data Vault Institute

While reading the comments on an SSC article about fault-tolerant ETL loading, I came across a link to the Data Vault Institute, and what sparked my attention was the comment by Dan Linstedt that "today's data warehouse has become a system of record. Due in part for the need of compliance." This struck me as an important point, given that many of the databases on which I've worked in the past were marketing databases, where bad data was generally disposed of. But in financial, health care, and governmental data warehouses, this approach would be completely unsatisfactory.