Mastering DBMS: Learn Basics to Advanced Technique

Star Schema and Snowflake Schema in Data Warehousing

Star Schema and Snowflake Schema in Data Warehousing

What are Star Schema and Snowflake Schema in a Data Warehouse?

Data Modeling Plays an important role in managing a database as it logically describes the database in terms of various parameters such as how data will be stored, what constraints to follow, and how data objects are related to each other. In Data Warehouse also, Data Modeling is crucial for data storage in a data warehouse.

Unlike Relational Data Models in DBMS, the Data Warehouse is based on Multidimensional Modeling, in which Star Schema and Snowflake Schema are used for defining the data storage in a Data Warehouse.  This article covers the Star and Snowflake Schema in detail. But before that, let’s see what is Multidimensional modeling in a Data Warehouse.

Multidimensional Modeling

In this modeling technique, the data is organized in terms of either “Facts” or “Dimensions.” The Facts represent the numerical transaction data like the amount paid for the product while Dimensions represent the perspectives around which data is stored like the date of payment.

Thus, a Fact Table stores the numerical data while a Dimension Table stores the fields that describe the data. For implementing Multidimensional Modeling in Data Warehouse, Star Schema and Snowflake Schema are used.

For example, consider the following table which shows the data of a student enrolled in a course.

In this table, the Facts are:

  • 2 students enrolled
  • The total amount paid by students is 2350

On the other hand, the Dimensions which describe the Facts are:

  • Student Names
  • Course Name
  • Date of payment, etc

Now, using Facts and Dimensions, Multidimensional Modeling organizes data in terms of a Data Cube which is based on “Course Details” in the following way.

You might be confused here because all the columns of the table are not represented in the Data Cube. But, the fact is a Data Cube in a Data Warehouse can be n-dimensional. Thus, any number of dimensions (columns of the table) can be represented in the multidimensional data model.

Types of Schema in Multidimensional Modeling

A Schema is the Structural description of the data stored in the database, which is of two types in Data Warehousing. Following is a detailed description of the Star Schema and Snowflake Schema.

Star Schema

Star Schema is the most elementary form of the Dimensional Model. It organizes the data in the form of Facts and Dimensions. It contains:

  • A Fact Table in the center has a foreign key column referring to the various Dimensional Tables in the schema. A Fact Table is related to the Dimensional Table as a many-to-one relationship.
  • Multiple Dimension Tables each with its primary key column. Dimension Tables are not related to each other. Instead, they are related to the Fact Table.

The organization of Fact Tables and Dimension Tables resembles a Star like structure. That’s what it is called Star Schema. The diagram below shows a general Star Schema.

Snowflake Schema

Snowflake Schema is similar to Star Schema with one difference that each Dimension can contain its sub-dimensions which are called its levels. It is the expansion of the Star Schema through which each Dimension Table can be associated with its Sub-dimension Tables. For example, consider the below diagram which shows a two-dimensional Snowflake Schema with 3 levels of Sub-dimensions.

Thus, we have discussed the concept of the Star Schema and Snowflake Schema. Now, let’s see the difference between the two.

Difference between Star Schema and Snowflake Schema

The following table shows the key differences between the Star and Snowflake Schema in the Data Warehouse.

Star Schema 

Snowflake Schema

  • Star Schema contains Fact Table and Dimension Table.

  • Snowflake Schema contains a Fact Table, Dimension Table, and Sub-dimension Table.

  • There is Data Redundancy. Thus, it uses more space.

  • There is low Data Redundancy and uses less disk space. 

  • Less number of Joins are used which makes query performance better.

  • In Snowflake Schema, more joins are involved therefore, query performance is lower as compared to Star Schema.

  • In Star Schema, all attributes of dimension are denormalized in a single table. This increases the redundancy.

  • Snowflake Schema normalizes the data of dimension into separate tables which removes data redundancy. 

  • Less number of Foreign Keys are involved in Star Schema.

  • Since multiple sub-dimension tables are related by a Dimension, more Foreign Keys are involved.

  • Due, to the denormalization of data, it is less complex.

  • Due to the Normalization of attributes of a dimension, its complexity is higher than the Star Schema.

Therefore, both the Star Schema and Snowflake Schema have their utility for Multidimensional Data Modeling in Data Warehousing.