Monday, April 29, 2013

Multiple Fact Reporting on (Non-)Conforming dimensions – Part II

What happens when you want to filter on a non-conforming dimension?

When you filter an a non-conforming dimension, you could get a null value for the fact which has all the dimensions as conforming.
 
 
Is this not exactly how the Oracle BI-server works? We have created one logical fact table and two different logical table sources (LTS). The Oracle BI-server creates seperate queries for each LTS in a Oracle BI Answers query.
Query I:

Query I connects fact table 1 to all the dimensions, including a filter on the non-conforming dimension.
Query II:

Query II connects fact table 2 one to the conforming dimensions only. Because there is no physical relationship between fact table 2 and the non-conforming dimension, it is not possible to filter query II on  dtnc1.value3 = ’3CCC3′ as well.
The results of both queries:

Lucky enough for me Nicolas did some investigating himself to:
It looks like the most easy solution is to filter on the fact table which has all the dimensions as conforming. In this case, that would be; Fct1 Value1.

Solution you could see in Multiple Fact Reporting on (Non-)Conforming dimensions – Part III

Multiple Fact Reporting on (Non-)Conforming dimensions - Part I

There are some questions, which are popping up at the Oracle BI EE Forums regularly. One of those questions is;

*** How to model multiple facts against (non-) conforming dimensions?
I will try to work things out. Click on the images to see more detail.
Note: I am aware of the extra white space between the images. That’s not intended functionality, but lack of knowledge of  WordPress.
Picture the following:
There are two fact tables and three dimension tables. FACT_TABLE_1 has two conformed dimension tables; DIM_TABLE_CONF_1 and DIM_TABLE_CONF_2 and one non-conformed dimension table DIM_TABLE_NON_CONF_1.
FACT_TABLE_2 has two conformed dimension tables; DIM_TABLE_CONF_1 and DIM_TABLE_CONF_2.
The Physical Model would have the following structure:

Physical Diagram
Based on the Physical Model we could construct the following Logical Model:

Logical Diagram
I have created one fact table which contains Logical Table Sources (LTS) for FACT_TABLE_1 and FACT_TABLE_2

Logical Model
As you can see I have created Dimensions (Hierarchy’s) for each Dimension Table.
FACT_TABLE_2 has no physical relationship with DIM_TABLE_NON_CONF_1. Therefore you should set the logical levels for FACT_TABLE_2 to the ‘Grand Total’-level of DIM_TABLE_NON_CONF_1. This way the Oracle BI Server won’t look for a join between DIM_TABLE_NON_CONF_1 and FACT_TABLE_2.
If you want to avoid nulls, set the detail levels for the facts. Set the ‘Grand Total’-levels for the metrics as well.

Logical Table Source - Fact I

Logical Table Source - Fact II

Logical Column - Fact II
If we take a look at Oracle BI Answers, we can create a report which contains data from the following tables;
  • DIM_TABLE_CONF_1
  • DIM_TABLE_CONF_2
  • FACT_TABLE_1
  • FACT_TABLE_2 

Oracle BI Answers - Conformed Dimension
Now we can bring data from DIM_TABLE_NON_CONF_1 into this report. It is impossible to devide data from FACT_TABLE_2 over this dimension. Therefore the data will be the same for every value of this dimension.

Oracle BI Answers - (Non-) Conformed Dimension
*** Summary:
 It’s possible to report on facts and dimensions which not have a physical relationship to each other. Just make sure you create dimensions (hierarchy’s) for every dimension table. Next to that you should set the logical levels for your logical tables.

Data Warehouse Design Approaches


Data warehouse design is one of the key technique in building the data warehouse. Choosing a right data warehouse design can save the project time and cost. Basically there are two data warehouse design approaches are popular.

Bottom-Up Design:

In the bottom-up design approach, the data marts are created first to provide reporting capability. A data mart addresses a single business area such as sales, Finance etc. These data marts are then integrated to build a complete data warehouse.  The integration of data marts is implemented using data warehouse bus architecture. In the bus architecture, a dimension is shared between facts in two or more data marts. These dimensions are called conformed dimensions. These conformed dimensions are integrated from data marts and then data warehouse is built.

Advantages of bottom-up design are:
  • This model contains consistent data marts and these data marts can be delivered quickly.
  • As the data marts are created first, reports can be generated quickly.
  • The data warehouse can be extended easily to accommodate new business units. It is just creating new data marts and then integrating with other data marts.

Disadvantages of bottom-up design are:
  • The positions of the data warehouse and the data marts are reversed in the bottom-up approach design.

Top-Down Design:

In the top-down design approach the, data warehouse is built first. The data marts are then created from the data warehouse.

Advantages of top-down design are:
  • Provides consistent dimensional views of data across data marts, as all data marts are loaded from the data warehouse.
  • This approach is robust against business changes. Creating a new data mart from the data warehouse is very easy.

Disadvantages of top-down design are:
  • This methodology is inflexible to changing departmental needs during implementation phase.
It represents a very large project and the cost of implementing the project is significant

Thursday, April 25, 2013

Degenerate dimension


A Degenerate Dimension is a DIMENSION which exists within the fact table alone - as in a Foreign Key which has no Primary Key.
Usually when a dimension table is growing at roughly the same rate as the fact table, there is a degenerate dimension lurking that has been missed in the initial design.
A degenerate dimension (DD) acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions. Sometimes people want to refer to degenerate dimensions as textual facts, however they’re not facts since the fact table’s primary key often consists of the DD combined with one or more additional dimension foreign keys.
The easiest way to understand what we are talking about is via an example.
For instance, we might have a FACT table for sales - tblSales - which contains the following data:
CustomerID  ProductID CreditCardType
101         5         Visa
102         6         Mastercard
103         7         Visa
In this case, the CustomerID and ProductID are conventional dimensional references - they are Foreign Keys which have a corresponding primary Key table.
However, CreditCardType is also a piece of dimensional information - something we might want to analyze by - but there is no tblCreditCardType. CreditCardType is a Degenerate Dimension.
Where degenerate dimensions come in is that there are often some columns that we want to have, but that are not measures, and don't have a table of stuff we want to join to.  Example:  a purchase order number.  These columns store something that we want to have (the purchase order number), but to create an empty dimension table would only slow things down.  So, to ensure we don't feel bad about breaking the "only a measure or a dimension in the fact table" rule, we just CALL them dimensions- even without the table.
In the fact itself, any attribute of the purchase order that was of interest, and that therefore had values that would each have more attributes we would be interested in would have been turned into a dimension, and a dimension table would have been created.
But to create a dimension table that contains a row for every purchase order would create a very large dimension with nothing in it (since there are lots of purchase orders, possibly as many as there are facts if the grain your fact table is one per purchase order).  But our users would not be happy if they could not get a list of the purchase orders included in a given total, or be able to drill down to that bottom level of detail that we've gone to all the trouble to include.
So, when we create transactional level fact tables, it is normal, in fact, necessary to include some degenerate dimensions- include columns that have useful information (very often referencing back to the source system) but that do not join to any dimension table. Plus you can just impress everyone with your dimensional modelling knowledge when you say "degenerate dimension"

Dimension Table


A dimension table is a table that stores details about the facts. Dimension tables contain attributes that describe fact records in the fact table.

A dimension table typically has two types of columns, primary keys to fact tables and textual/descreptive data. Commonly used dimensions are customers, products, and time
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.

OLTP VS OLAP



OLTP System 
OLAP System 

Online Analytical Processing 

(Operational System)
(Data Warehouse)
Source of data
Operational data; OLTPs are the original source of the data.
Consolidation data; OLAP data comes from the various OLTP Databases
Purpose of data
To control and run fundamental business tasks
What the data
Reveals a snapshot of ongoing business processes
Multi-dimensional views of various kinds of business activities
Inserts and Updates
Short and fast inserts and updates initiated by end users
Periodic long-running batch jobs refresh the data
Queries
Relatively standardized and simple queries Returning relatively few records
Often complex queries involving aggregations
Processing Speed
Typically very fast
Space Requirements
Can be relatively small if historical data is archived
Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Highly normalized with many tables
Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Backup and Recovery
Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability
Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method