Data Warehousing Part 2 Dimensional Modeling

The amazing adventures of Doug Hughes

Data Warehousing Part 2 Dimensional Modeling

October 28, 2007

First of all I should apologize for the long delay between this article and the preceding one which discussed the differences between On Line Transactional Processing (OLTP) and On Line Analytical Processing (OLAP). At a very high level OLTP databases are those used in typical day to day processes, inserts and updates etc and involve what is known as “normalization” which is used to avoid data duplication, here is a good clear overview of normalization.

OLAP databases store historical data and are typically used for producing reports, analysis etc. Data Warehousing projects are almost always based on an OLAP design. After that short revisit we can now move on the the next process in creating a Data Warehouse. At this stage a process known as “Dimensional Modeling” comes into play. Typically we will be considering using data in a data warehouse for Business Intelligence (BI) purposes; reporting being a major part of that. If one of the main purposes of normalization is to avoid duplication of data; at a simple level that is not a key need in a data warehouse. One reason is that we should never make changes to data directly in a data warehouse. We may pass updated data in from the OLTP database copy but never update it directly in the data warehouse. Dimensional modeling of data involves these main constituent elements:

Dimension:

A dimension can be considered to be a category of information. For instance a geographical dimension.

Attribute:

An attribute is an identifiable constituent part within a dimension, for instance a country.

Hierarchy:

A way of relating constituent parts within a dimension relating to their level, for instance country >city >state >zip >longitude and Latitude. The data elements in a data warehouse are contained and associated in a different way to their state in a OLTP database.

There are two main “containers” which are represented as database tables:

The Fact Table:

We use the fact table to hold the “measures of interest” as required by the business-organizational users. For instance in an application measuring traffic patterns number of cars would be a measure of interest. The level of granularity, which is an important consideration in designing fact tables, is, determined by, in this example, the required sampling intervals of the data. For instance are the number of carssampled hourly, daily, weekly etc? As an example if we were analyzing a busy junction we might want the number of cars each hour at that junction in that case the fact table could contain four columns

Number of Cars
Date
Time (hourly)
Longitude and Latitude

The lowest level of granularity in this example is the Time attribute. Discussion with users to determine their requirements is a critical need in effective data warehouse design. Ideally the team designing a data warehouse should also have some knowledge of the needs of a particular sector as this helps in identifying elements that may be overlooked by users. As we mentioned, granularity is a very important part in getting the fact table design right.

The Lookup Table:

The look up tables represent one dimension in the model and contain attributes for each dimension. Typically there will only be one link to each look up table, the fact table is the only table with multiple links one to each look up table. In a basic data warehouse design there in one fact table which joins to multiple “Look Up Tables”. If we imagine the fact table in the center with look up tables around the fact table this is where the term “Star Schema” comes from.

Star Schema

Dimensional modeling of data involves these main constituent elements:

Dimension:

A dimension can be considered to be a category of information. For instance a geographical dimension.

Attribute:

An attribute is an identifiable constituent part within a dimension, for instance a country.

Hierarchy:

A way of relating constituent parts within a dimension relating to their level, for instance country >city >state >zip > longitude and latitude.

Summary:

Dimensional modeling is a somewhat abstract principle and one that is very requirement specific; needing to be created for specific business-organizational user needs. It is becoming increasingly more importing particularly in driving the needs of organizations to plan; based on historical information. There are some guiding principles to bear in mind.

Data should not be updated or changed directly in the data warehouse.
Look up tables should contain only one link per table.
Fact tables can contain multiple links one to each look up table.

Category:

Blog

4 Comments

Comments on: "Data Warehousing Part 2 Dimensional Modeling" (4)

ALBERTO GARCIA said:

April 9, 2008 at 12:00 am

What aboout SCD types and surrogate keys?

LikeLike
Mihai Campean said:

May 13, 2008 at 12:00 am

Thanks for the post, the information is very useful. I would like to ask you if you can give some references for further reading in the data warehousing domain (some books you consider useful). Also, I am looking forward to the next post in the series.

LikeLike
Mike Brunt said:

May 14, 2008 at 12:00 am

@Mihai, thank for taking the time to comment and to be fair to everyone, I have not been very good at following up on this series. I need to refocus and will do after my upcoming presentations are completed.

LikeLike
myfacts said:

July 18, 2009 at 12:00 am

what will be measures if I want to create a star for a human resource database?

LikeLike

Comments are closed.

DougHughes.net

The amazing adventures of Doug Hughes