Slowly Changing Dimensions (SCDs) are data attributes in a data warehouse that change over time at a slower pace than other data. SCDs are typically used to track historical changes to data such as customer information, product information, and sales data. There are three main types of SCDs:

Type 1: Overwrite

Here is an example of customer table, if Sylvia moved from Melbourne to Sydney, SCD Type 1 suggests that the new table will change the Address from “Melbourne” to “Sydney”. The problem with using SCD Type 1 in this example is that there is no tracking of historical changes to Sylvia’s address. When Sylvia moves from Melbourne to Sydney, the address in the customer table is simply updated with the new address. This means that the previous address in Melbourne is lost, and there is no way to retrieve that information from the data warehouse.

This can be a problem if historical data is needed for analysis or reporting purposes. For example, if a business wants to analyse customer behavior in Melbourne versus Sydney, they would not be able to do so accurately because the historical data for Sylvia’s address in Melbourne has been lost.

In such cases, using a different SCD type, such as Type 2, would be more appropriate. Type 2 allows for the creation of a new record for each change, with an effective start and end date. This means that the old address in Melbourne would still be stored in the data warehouse, allowing for historical analysis and reporting.

Type 2: Add New Row

In the above example, ID is the Natural Key and SK_ID is the Surrogate Key, which is a synthesized key that uniquely identify each row. With the support of the surrogate key, we can insert multiple records for new changes.

In this instance, we add “Effective Date” columns to each records for Sylvia’s address. The Effective Date of the second record(i.e. 17/04/2023) is equal to the Expiry Date of the previous record(i.e. 17/04/2023).

To avoid null-value substitution during querying and to include the field in an index, a standardised surrogate high date (e.g. 31/12/9999) can be used as an end date instead.

One of the main problems with SCD Type 2 is that it can lead to a significant increase in the size of the data warehouse, as each change creates a new record. This can result in slower query times and increased storage costs.

 

Type 3: Add New Attributes

SCD Type 3 tracks changes using separate columns and has a limited historical preservation capability, limited to the number of columns designated for storing historical data. Unlike Type 1 and Type 2, which use the original table structure, Type 3 adds additional columns to store historical data.

For example, we add a “Previous Address” Column for Customer’s address. The record contains a column for the previous address and current address—cannot track the changes if the Sylvia relocates a second time .The main disadvantage of SCD Type 3 is the limited historical preservation capability, as it is only able to store historical data for a limited number of designated columns. This means that any changes to columns that are not designated for historical preservation will be lost. This can be problematic if more extensive historical analysis is required or if there is a need to track changes to a larger number of attributes.

 

To summarise, I have created a table that outlines each SCD Type, the corresponding Dimension Table Action, and its impact on Fact Analysis.

 

 

The Data School
Author: The Data School