Showing posts with label Data warehouse concepts. Show all posts
Showing posts with label Data warehouse concepts. Show all posts

Monday, April 29, 2013

Data Warehouse Design Approaches


Data warehouse design is one of the key technique in building the data warehouse. Choosing a right data warehouse design can save the project time and cost. Basically there are two data warehouse design approaches are popular.

Bottom-Up Design:

In the bottom-up design approach, the data marts are created first to provide reporting capability. A data mart addresses a single business area such as sales, Finance etc. These data marts are then integrated to build a complete data warehouse.  The integration of data marts is implemented using data warehouse bus architecture. In the bus architecture, a dimension is shared between facts in two or more data marts. These dimensions are called conformed dimensions. These conformed dimensions are integrated from data marts and then data warehouse is built.

Advantages of bottom-up design are:
  • This model contains consistent data marts and these data marts can be delivered quickly.
  • As the data marts are created first, reports can be generated quickly.
  • The data warehouse can be extended easily to accommodate new business units. It is just creating new data marts and then integrating with other data marts.

Disadvantages of bottom-up design are:
  • The positions of the data warehouse and the data marts are reversed in the bottom-up approach design.

Top-Down Design:

In the top-down design approach the, data warehouse is built first. The data marts are then created from the data warehouse.

Advantages of top-down design are:
  • Provides consistent dimensional views of data across data marts, as all data marts are loaded from the data warehouse.
  • This approach is robust against business changes. Creating a new data mart from the data warehouse is very easy.

Disadvantages of top-down design are:
  • This methodology is inflexible to changing departmental needs during implementation phase.
It represents a very large project and the cost of implementing the project is significant

Thursday, April 25, 2013

Degenerate dimension


A Degenerate Dimension is a DIMENSION which exists within the fact table alone - as in a Foreign Key which has no Primary Key.
Usually when a dimension table is growing at roughly the same rate as the fact table, there is a degenerate dimension lurking that has been missed in the initial design.
A degenerate dimension (DD) acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions. Sometimes people want to refer to degenerate dimensions as textual facts, however they’re not facts since the fact table’s primary key often consists of the DD combined with one or more additional dimension foreign keys.
The easiest way to understand what we are talking about is via an example.
For instance, we might have a FACT table for sales - tblSales - which contains the following data:
CustomerID  ProductID CreditCardType
101         5         Visa
102         6         Mastercard
103         7         Visa
In this case, the CustomerID and ProductID are conventional dimensional references - they are Foreign Keys which have a corresponding primary Key table.
However, CreditCardType is also a piece of dimensional information - something we might want to analyze by - but there is no tblCreditCardType. CreditCardType is a Degenerate Dimension.
Where degenerate dimensions come in is that there are often some columns that we want to have, but that are not measures, and don't have a table of stuff we want to join to.  Example:  a purchase order number.  These columns store something that we want to have (the purchase order number), but to create an empty dimension table would only slow things down.  So, to ensure we don't feel bad about breaking the "only a measure or a dimension in the fact table" rule, we just CALL them dimensions- even without the table.
In the fact itself, any attribute of the purchase order that was of interest, and that therefore had values that would each have more attributes we would be interested in would have been turned into a dimension, and a dimension table would have been created.
But to create a dimension table that contains a row for every purchase order would create a very large dimension with nothing in it (since there are lots of purchase orders, possibly as many as there are facts if the grain your fact table is one per purchase order).  But our users would not be happy if they could not get a list of the purchase orders included in a given total, or be able to drill down to that bottom level of detail that we've gone to all the trouble to include.
So, when we create transactional level fact tables, it is normal, in fact, necessary to include some degenerate dimensions- include columns that have useful information (very often referencing back to the source system) but that do not join to any dimension table. Plus you can just impress everyone with your dimensional modelling knowledge when you say "degenerate dimension"

Dimension Table


A dimension table is a table that stores details about the facts. Dimension tables contain attributes that describe fact records in the fact table.

A dimension table typically has two types of columns, primary keys to fact tables and textual/descreptive data. Commonly used dimensions are customers, products, and time
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.

OLTP VS OLAP



OLTP System 
OLAP System 

Online Analytical Processing 

(Operational System)
(Data Warehouse)
Source of data
Operational data; OLTPs are the original source of the data.
Consolidation data; OLAP data comes from the various OLTP Databases
Purpose of data
To control and run fundamental business tasks
What the data
Reveals a snapshot of ongoing business processes
Multi-dimensional views of various kinds of business activities
Inserts and Updates
Short and fast inserts and updates initiated by end users
Periodic long-running batch jobs refresh the data
Queries
Relatively standardized and simple queries Returning relatively few records
Often complex queries involving aggregations
Processing Speed
Typically very fast
Space Requirements
Can be relatively small if historical data is archived
Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Highly normalized with many tables
Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Backup and Recovery
Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability
Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method