Basic Elements of Data Warehouse

  • Dimension

                Dimension is a category of information.                
                 For Example : The time dimensions.
  • Attribute
                 A unique level within a dimension.
                For Example : Month is an attribute in the Time Dimension.
  • Hierarchy
        The specification of levels that represents relationship between different attributes 
         within a dimension.
                For Example : one possible hierarchy in the Time dimension is
                Year → Quarter → Month → Day.
 
  •  Fact Table:
A fact table is a table that contains the measures of interest.  A fact table is a table that contains Quantitative Information.
  • For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity.
  • For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A Date_Id column, a store column, and a sales amount column.
  •   Dimension Table
 A dimension table stores attributes, or dimensions, that describe the objects in a fact table. A Dimensional Table is a table that contains Qualitative Data
    • For Example : Date_Id stored in Time Dimension references fact table.
  • Star Schema :
In the star schema design, a single object (the fact table) sits in the middle and is radically connected to other surrounding objects (dimension lookup tables) like a star.


  • Snowflake Schema :
  • 1.    The snowflake schema is an extension of the star schema, where each  point of  the star explodes into more points.
    2.      In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.
    3.      The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables.
    4.    The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables
      

  • Keys Used in Data Warehouse :

    Primary Keys :

              Uniquely identify a record (Example: Employee ID).

    Foreign Keys :

              Primary key of another table referred here  (Dimension ID).
      
    Surrogate Keys :

    1. System-generated key for dimensions
    2. Key on its own has no meaning
    3. Integer key, less space
 
Slowly Changing Dimensions

A dimension is considered as Slowly Changing Dimension when its attributes remain almost constant over time, requiring relatively minor alterations to represent the evolved state.

  • Type 1 Slowly Changing Dimension

Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

Customer Key
Name
State
1001
Christina
Illinois

  • Type 2 Slowly Changing Dimension :
  •              In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
Original Table:

Customer Key
Name
State
1001
Christina
Illinois
  •      After Christina moved from Illinois to California, we add the new information as a new row into the table.
Changed  Table:

Customer Key
Name
State
1001
Christina
Illinois
1005
Christina  
California

Advantage :

 This allows us to accurately keep all historical information.    

Disadvantage :

  1.  This will cause the size of the table to grow fast. In cases where the number  of rows  for the table is very high to start with,
  2.  storage and performance can become a concern.
 
Type 3 Slowly Changing Dimensions :
  • Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

Customer Key
Name
State
1001
Christina
Illinois

  • After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003)  

Customer Key
Name
Original State
Current State
1001
Christina
Illinois
California

Advantages:

1.       This does not increase the size of the table, since new information is updated.
2.       This allows us to keep some part of history.

Disadvantages:

Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost. 

DW ETL Scheduling: 
1.    Daily Scheduling.
2.    Weekly Scheduling.
3.    Adhoc Schedule Run.
4.    Run On Demand.
5.    Monthly Schedule.

Popular Posts

Featured Post

Basic unix commands used in Administration

ls –ltr                Shows all the files and sub directories in the  current directory. ls –la                  Shows all the hidden...