This blog post is a deeper dive of star schemas. Please go to this link to check that out first (https://www.thedataschool.com.au/andrew-banh/the-allure-of-star-schemas/).
Facts and dimensions tables form the basis of any business’ data strategy. It is from the data contained in these tables which data driven business decisions are made.
They both serve different purposes. When designing a database, the dimension table may have a parent table (that being the fact table) but a fact table always has a dimension table. Fact tables are able to be aggregated where as dimension tables are not. The logic behind this is that fact tables are not meant to be updated but dimension tables serve that purpose when need be.
Let’s start with Fact Tables.
Fact tables hold data specific to a business process. This means that each row is associated with a process and also contains the measurement data relating to it. Numerical data is usually contained in these tables and can be manipulated in the sense that it can be aggregated. Thinking to a retail environment, an example of a fact table can be a table relating to customer orders. A manager of the business would be able to sum the sales across all the stores in a particular region.
A lot of planning goes into the construction of these tables, specially the grain represented in the table. Carrying on with the example of a store, the developer can either develop the table to be at the grain of a customer transaction or an individual item purchase. If deciding on the latter, it would mean that each customer purchase might create one or more fact table entries. These factors need to be taken in to consideration.
Now for the complementing table, Dimension Tables.
Dimension tables can be related to 1 or none of the fact tables. Dimension tables contain descriptive attributes and are usually textual fields or discrete numbers. Dimension fields should be designed to be verbose, descriptive, have no missing data and discretely valued. This is why the dimensions in these tables should usually be a simple integer. This design harbours performance as join processing is much more efficient. Queries can efficiently drill down into different fact tables and their corresponding dimension table and perform joins across the dimensions.
Commonly used dimensions, while still considering the store as an example, can include people, products, place and time. This means that in the product table, each record will contain an item and the details associated with that item (such as costs, supplier, size, etc.). The primary keys in these tables link to the foreign keys in the fact table.