Navigation Guide

Intro

Today’s topic is about databases, where I am using a real-life metaphor instead of talking in technical terms to help you understand what a database is, and what it does.

 

What is a database?

As always, let’s start with the dry and boring definition of a database.

“A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just a database.

Data within the most common types of databases in operation today is typically modeled in rows and columns in a series of tables to make processing and data querying efficient. The data can then be easily accessed, managed, modified, updated, controlled, and organized. Most databases use structured query language (SQL) for writing and querying data.”

Don’t understand it? No need to worry, here is a more fun and simple version definition:

Think of a database like a well-organized digital library where information is neatly stored in a computer. This digital library is run by a friendly librarian called a Database Management System (DBMS). Together, the info in the library, the librarian (DBMS) who helps us find, update, and arrange the information on the bookshelf, and the cool gadgets and tools that comes with them make up a database system. But people are lazy, so sometimes we refer to it as database for short.

In this digital library, information is arranged in rows and columns, kind of like how books are on shelves. This makes it easy to find and work with the information quickly. The librarian (DBMS) also speaks a special language called Structured Query Language (SQL), which helps us talk to the database and ask it questions or make changes. It’s like having a secret code to interact with our digital library smoothly.

Make more sense now right?

 

Types of databases

In the world of databases, think of them as diverse sections in a library—there’s the classic Fiction (relational databases), the creative Non-Fiction (NoSQL databases), and the unique Reference Section (object-oriented databases). For our blog, let’s explore the Relational databases, where data is neatly organized on tables, akin to books neatly arranged on shelves.

By the way, SQL is a programming language used by nearly all relational databases to query, manipulate, and define data, and to provide access control. All relational DBMS software supports SQL.

 

Relational Database

A relational database (RDB) is a way of structuring information in tables, rows, and columns. An RDB has the ability to establish links—or relationships–between information by joining tables, which makes it easy to understand and gain insights about the relationship between various data points.

In the last blog we talked about data models, and this is a relational data model, where two tables are related over a PK and an FK.

Ring a bell?

A relational database management system (RDBMS) is a program used to create, update, and manage relational databases. Some of the most well-known RDBMSs include MySQL, PostgreSQL, MariaDB, Microsoft SQL Server, and Oracle Database.

Cloud-based relational databases like Cloud SQL, Cloud Spanner and AlloyDB have become increasingly popular as they offer managed services for database maintenance, patching, capacity management, provisioning and infrastructure support.

 

Database structure

A database has a 3-tiered structure as shown below. (I had to borrow this content from Shane’s slides, I hope he doesn’t mind)

Fact and Dimension tables

Here is the key characteristics of the Fact table and Dimension table.

Fact Table:

  1. A fact table is a central table in a star schema or snowflake schema of a data warehouse.
  2. It typically contains quantitative data, also known as facts, that represent the measurements or metrics of a business process.
  3. Fact tables often have foreign keys that reference the primary keys of dimension tables, creating relationships.
  4. Examples of facts include sales revenue, quantity sold, or profit.

Dimension Table:

  1. A dimension table is a table that provides descriptive information or context for the data stored in a fact table.
  2. It contains attributes or textual information that helps in categorizing, filtering, and analyzing the data in the fact table.
  3. Dimension tables have a primary key that is often used as a foreign key in the fact table to establish a link between them.
  4. Examples of dimension tables include time dimensions (e.g., dates), customer dimensions (e.g., customer details), and product dimensions (e.g., product categories).

In simpler terms, think of the fact table as the numerical core, storing measurable data, while dimension tables provide the who, what, when, where, and why that add context and meaning to the numbers. It’s like a sales report (fact) linked to customer details (dimension) and product categories (dimension) to provide a comprehensive picture of business performance.

Example

Let’s consider a sales database for a retail company:

Sales Fact table

Date Dimension table

Customer Dimension table

Product Dimension table

In this example, the fact table stores quantitative data about sales transactions, and the dimension tables provide descriptive information related to dates, customers, and products. The fact and dimension tables are linked through foreign key relationships, creating a comprehensive structure for analyzing and understanding sales data.

 

Tools

So, it is time to build your own databases. Supabase is a useful tool to create databases with ease and it will visualise your data models.

Happy databasing!

 

References

https://www.oracle.com/au/database/what-is-database/

https://corporatefinanceinstitute.com/resources/data-science/database/

https://cloud.google.com/learn/what-is-a-relational-database#:~:text=A%20relational%20database%20is%20a,structures%20relate%20to%20each%20other

https://www.simplilearn.com/fact-table-vs-dimension-table-article

John Lyu
Author: John Lyu

John is deeply passionate about data and firmly believes in its transformative power, especially in areas like Machine Learning, Data-Centric AI (DCAI), and Storytelling with data. He is extremely excited about opportunities to test his knowledge and expand his industry insights, such as Kaggle competitions and visualization contests. John thrives on collaboration, enjoying the experience of working alongside talented colleagues and learning from their expertise. In his free time, John enjoys playing basketball, hitting the gym, and hanging out with friends. He is also a trading card collector. He enjoys making new friends, so be sure to come and say hi.