I have been learning dbt recently and wanted to get a better understanding of data warehousing methodologies. Two of the most well-known ones are the Kimball and Inmon approaches. In this post, I will explain the key differences between the two and why Kimball is generally considered better for analytics.

Kimball (Dimensional Modeling)

Kimball’s approach uses a star schema with two main types of tables:

  1. Fact Tables: Store measurable events (sales amount, quantity sold, order totals). These are the “what happened” of your business.
  2. Dimension Tables: Store descriptive context (customer info, product details, dates). These answer “who, what, where, when.”

The star schema looks like this:

              dim_customers
                    |
                    |
dim_products -- fact_sales -- dim_time
                    |
                    |
              dim_locations

Example tables:

fact_sales
date_keycustomer_keyproduct_keyquantitysales_amount
2024011510142259.98
dim_customers
customer_keynamecity
101AliceHouston
dim_products
product_keyproduct_namecategoryprice
42Widget AGadgets29.99
dim_date
date_keydatemonth
202401152024-01-15January

Sample query - “Total sales by city by category”:

SELECT dc.city, dp.category, SUM(fs.sales_amount) AS total_sales
FROM fact_sales fs
JOIN dim_customers dc ON fs.customer_key = dc.customer_key
JOIN dim_products dp ON fs.product_key = dp.product_key
GROUP BY dc.city, dp.category;

4 tables, 2 joins, reads like English, and business-user friendly!

Inmon (3rd Normal Form)

Inmon’s approach uses a 3rd Normal Form (3NF) structure, focusing on data integrity and reducing redundancy. The data is organized into many related tables, which can make querying more complex.

The 3NF schema looks like this:

customers                categories
    |                        |
    v                        v
 orders                  products
    |                        |
    +-----> order_lines <----+

Example tables:

customers
customer_idnamecity
101AliceHouston
products
product_idproduct_namecategory_id
42Widget A1
categories
category_idcategory_name
1Gadgets
orders
order_idcustomer_idorder_date
10011012024-01-15
order_lines
order_line_idorder_idproduct_idquantityunit_price
1100142229.99

Sample query - “Total sales by city by category”:

SELECT c.city, cat.category_name, SUM(ol.quantity * ol.unit_price) AS total_sales
FROM order_lines ol
JOIN orders o ON ol.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON ol.product_id = p.product_id
JOIN categories cat ON p.category_id = cat.category_id
GROUP BY c.city, cat.category_name;

5 tables, 4 joins, more complex and harder for business users to understand.

Key Differences

Aspect3NFDimensional
GoalData integrityQuery simplicity
RedundancyMinimalIntentional
JoinsManyFew
Primary UsersEngineersAnalysts
Typical LayerCentral warehouseData marts

The fundamental distinction: In 3NF, relationships describe how entities depend on each other. In a star schema, relationships exist only to describe facts.

Why Kimball is Better for Analytics

  • Simplicity: Kimball’s star schema is easier to understand and query, making it more accessible for analysts and business users.
  • Performance: Fewer joins in Kimball’s model often lead to faster query performance.
  • Business Focus: Kimball’s approach is designed with business questions in mind, making it more aligned with analytics needs.
  • Flexibility: Dimensional models can be more easily adapted to changing business requirements.

References