As data analysts, it can be easy to take data storage and retrieval for granted. However, the sub-discipline of data modeling is as complex as it is imperative for the simple and efficient organisation of data for any key stakeholder. Here are the most important ideas that you should be acquainted with to gain a greater appreciation for and understanding of the way databases are managed.

 

Relational Databases

The most common form of database is a relational database. Relational databases store data in tables (or relations). Each table consists of rows, which represent a unique instance of data, and columns, which represent the attributes recorded for each of these instances. Each column defines the type of information and hence the datatype of each entry within it. For instance, if an ‘Age’ column is created, the datatype may be defined as an integer so that only integers can be entered in this column for each row. Hence, the structure of each table is defined before any data is input. Most importantly, the language used to store data in or retrieve data from any relational database is called SQL (or Structured Query Language, often pronounced as ‘sequel’). To learn more about the history of relational databases, check out my previous blog A Tale of Two Databases.

 

Transactional, Reference & Master Data

There are three main categories of data that a business will store: transactional data, reference data and master data.

  • Transactional data contains all the event-based data that occur at some point in time, like invoices, applications and visits.
  • Reference data contains all the definitions and classifications that describe other data in common or universal terms, like units of measurement, corporate codes  and product descriptions.
  • Master data contains all the data about the business entities that are relevant to the business’ operation, like customer details, inventory information and financial records.

 

Primary & Foreign Keys

The power of relational database management systems (DBMSs) comes in their ability to keep tables of unique rows, and to store all of these tables in relation to one another. Each table must contain a primary key: a column that acts as a unique identifier so that no two rows may be identical. Hence, each value in this primary key column refers to a particular entry in its table. This primary key can then be referenced in a different table to establish a connection between them. When an original table’s primary key is referred to in another table, it is called a foreign key in this other table.

Suppose we have a Gaming Company table with data: A1, Sony; A2, Microsoft; A3, Nintendo; and we also have a Gaming Console table with data: B1, Xbox One; B2, Switch; B3, GameCube. We can create a foreign key in the Gaming Console table to reference the primary key in the Gaming Company table to establish a relationship. Now the Gaming Console table will have data: B1, Xbox One, A2; B2, Switch, A3; B3, GameCube, A3.

A primary key may also be generated based on a group of columns that in combination uniquely identify a row. A popular method is to use the hash function MD5, an algorithm that converts each of the values in the columns into a 32 digit hexadecimal string as an almost certainly unique ID.

 

ACID Properties

In relational DBMSs, the creation, modification or deletion of data, known collectively as transactions, maintain the four key ACID properties: atomicity, consistency, isolation and durability.  These four principles ensure the integrity of the data in the face of any unexpected disruptions.

  • Atomicity assures that each transaction is treated as a single atomic unit so that either all the data is affected, or none of it is – no incomplete transformations.
  • Consistency assures that each transaction can only change the data in a consistent way so that the database does not become corrupted – no invalid transformations.
  • Isolation assures that different transactions from multiple users are managed sequentially so that if two concurrent transactions are in conflict, one will be blocked – no incongruous transformations.
  • Durability assures that each transaction will remain stored so that when the system loses power it will still keep a record of the data – no impermanent transformations.

 

Familiarising yourself with each of these important ideas will prepare you for the formal process of data modeling. Happy modeling!

Cover image by mcmurryjulie from Pixabay

Hunter Iceton
Author: Hunter Iceton

Hunter Iceton is an enthusiastic and positive individual. He graduated from Sydney Uni in 2017 with a Bachelor of Commerce (Liberal Studies) majoring in Finance, Marketing and Quantitative Business Analytics. For the next few years, Hunter spent his time creating and releasing music, while tutoring primary and high school students in Mathematics and Business Studies. Hunter is now excited to be joining The Data School, looking forward to approaching analytics with a creative perspective. In his spare time, Hunter enjoys continuing to create music, reading philosophy and cooking plant-based dishes. Otherwise, he can usually be found at a restaurant, a bar or an art gallery.