Navigation Guide

    1. Conceptual Data Model
    2. Logical Data Model
    3. Physical Data Model

 

Intro

This blog outlines what we have learned in the second week of our training at the DS. Shout out to Shane Dolley our lovely Brisbane head coach who gave us an amazing lecture on data modeling and databases (this will be covered in the next blog). Data modeling is essential when we are dealing with databases as it lays out the design and the structure of the data. If you are someone like me who does not have previous experience with databases but keen to learn, then get ready to enjoy the ride.

 

What is data modeling?

Data modeling is a process used to define and organize data requirements for a system. It involves creating a conceptual representation of how data should be structured and related within a database, with the goal of supporting the business processes or requirements of an organization.

Data models are built around business needs. Rules and requirements are defined upfront through feedback from business stakeholders so they can be incorporated into the design of a new system or adapted in the iteration of an existing one.

Data modeling is an essential part of the database design process, aiding in communication between stakeholders, ensuring data integrity, and providing a blueprint for database implementation. It is widely used in various fields, including software development, business analysis, and database administration.

 

Three levels of data abstraction

There are different types of data models, including:

1.     Conceptual Data Model:

This represents the high-level view of the data and its relationships, focusing on business concepts and rules rather than technical details. Typically, they include entity classes (defining the types of things that are important for the business to represent in the data model), their characteristics and constraints, the relationships between them, and relevant security and data integrity requirements.

2.     Logical Data Model:

 

  • Focus: The logical data model focuses on representing the business information and the relationships between different data entities.
  • Abstraction: It is more detailed than the conceptual model but remains independent of specific database management systems (DBMS) or hardware considerations.
  • Entities and Relationships: It defines entities (such as customers, and products) and the relationships between them without specifying how the data will be stored physically.

3.     Physical Data Model:

  • Focus: The physical data model is concerned with how data is stored, organized, and accessed in the database.
  • Implementation Details: It includes details like tables, columns, indexes, data types, and constraints. It is influenced by the specific features and constraints of the chosen database technology.
  • Performance Optimization: It takes into account factors like data storage, indexing strategies, and query optimization to achieve efficient data retrieval and storage.

Keys

Keys in DBMS are attributes that you use to identify specific rows inside a table, in addition to finding the relation between two tables.

The two most basic keys we must know are PK (Primary key) and FK (Foreign key). They relate tables together. But keys in DBMS can take several other forms:

  1. Primary Key:
    • A primary key is a unique identifier for each record in a table.
    • It ensures that each record can be uniquely identified.
    • No two records can have the same primary key value.
    • Primary keys are often used as references in other tables (foreign keys).
  2. Foreign Key:
    • A foreign key is a field in a table that refers to the primary key in another table.
    • It establishes a link between the two tables.
    • Foreign keys help maintain referential integrity in the database, ensuring that relationships between tables are valid.
  3. Candidate Key:
    • A candidate key is a set of one or more fields that can uniquely identify a record.
    • From the set of candidate keys, one key is chosen as the primary key.
  4. Composite Key:
    • A composite key is a key that consists of more than one attribute (column) to uniquely identify a record.
    • It is used when a single attribute is not sufficient for unique identification.
  5. Natural Key:
  • a natural key is a type of data attribute or set of attributes that already exists in the real world and uniquely identifies a record within a database table
  • Unlike surrogate keys, which are artificially created for the sole purpose of uniquely identifying records in a database, natural keys are inherent to the data being modelled.
  • Examples: Social security number, Vehicle Identification Numbers (VINs)
  1. Super Key:
    • A super key is a set of one or more keys that can uniquely identify a record.
    • It may contain more attributes than necessary to uniquely identify a record.
  2. Surrogate Key:
    • A surrogate key is a system-generated key (usually a numeric value) that is used as the primary key.
    • It is introduced to provide a unique identifier for each record, especially when natural keys may not be suitable.
  3. Alternate Key:
    • An alternate key is a candidate key that is not chosen as the primary key.
    • It could be used as a primary key if the selected primary key is not suitable for some reason.

 

Relationship (crows)

First, let’s look at the crow’s foot notation. They consist of two symbols, the first one is called multiplicity, which refers to the maximum number of times that an instance of one entity can be associated with instances in the related entity. It can be one or many.

The second symbol describes the minimum number of times one instance can be related to others. It can be zero or one, and accordingly describes the relationship as optional or mandatory.

  • Zero or many

  • One or many

  • One and only one

  • Zero or one

Let’s look at this example:

This is a one-to-many relationship, the first part of the relationship can be interpreted as there should be 1 lecturer teaching something. We cannot have more than 1 lecturer to teach the same course, as it would be a waste. Also, we cannot have a class without a lecturer, then the students would be lost. (one and only one)

The second part of this relationship is saying that 1 lecturer can simultaneously teaching multiple courses because that’s what professors do, but he/she can also choose to not teach at all and focus 100% of time on his/her research. (Zero or many)

 

Types of data modeling

The three primary data model types are relational, dimensional, and entity-relationship (E-R). There are also several others that are not in general use, including hierarchical, network, object-oriented, Dimensional, and multi-value. The model type defines the logical structure – how the data is stored, logically – and therefore how it is stored, organized, and retrieved. But in this blog, we are only talking about ER model.

 

Benefits of data modeling

  • Improved understanding of data: Data modeling helps stakeholders to better understand the structure and relationships of the data, which can help to inform decisions about how to use and store the data.
  • Improved data quality: Data modeling can help to identify errors and inconsistencies in the data, which can improve the overall quality of the data and prevent problems later on.
  • Improved collaboration: Data modeling helps to facilitate communication and collaboration among stakeholders, which can lead to more effective decision-making and better outcomes.
  • Increased efficiency: Data modeling can help to streamline the development process by providing a clear and consistent representation of the data that can be used by developers, database administrators, and other stakeholders.

 

Data modelling tools

Some useful tools to sketch out your data models. It is a free online tool that has built-in ER diagram function. Try it out using this link.

 

Limitations Of Data Modeling

Despite the many benefits of data modeling, there are also some limitations and challenges to consider. Some of the limitations of data modeling include:

  • Limited flexibility: Data models can be inflexible, making it difficult to adapt to changing requirements or data structures.
  • Complexity: Data models can be complex and difficult to understand, which can make it difficult for stakeholders to provide input or collaborate effectively.
  • Time-consuming: Data modeling can be a time-consuming process, especially for large or complex datasets.

 

Outro

So now we know that Data modeling is the process of organizing information in a way that makes sense for a business, helping create a clear plan for how data should be stored and used. There are three level of data abstraction: conceptual, logical, and physical. Each has a different purpose and contains different elements. From the left to right is the most abstract to the least abstract data model. We also learned what keys are in DBMS and the different types of keys. The relationship between tables is represented by the crow’s foot, that part is my personal favorite. Then we talked about the pros and cons of data modeling.

Now hopefully you have a better understanding of data modelling and should be able to construct your own data models. You can do so using the LucidChart tool!

In our next blog, we are going to talk about databases. Stay tuned, take care and I’ll see you when I see you.

 

References

https://www.ibm.com/topics/data-modeling

https://www.simplilearn.com/what-is-data-modeling-article

https://www.sap.com/australia/products/technology-platform/datasphere/what-is-data-modeling.html#:~:text=What%20are%20the%20types%20of,oriented%2C%20and%20multi%2Dvalue.

https://www.opit.com/magazine/keys-in-dbms/#:~:text=Keys%20in%20DBMS%20are%20attributes,and%20%E2%80%9CTeacher%E2%80%9D%20as%20attributes.

https://www.ibm.com/docs/en/db2/10.5?topic=operators-natural-keys

https://vertabelo.com/blog/crow-s-foot-notation/

John Lyu
Author: John Lyu

John is deeply passionate about data and firmly believes in its transformative power, especially in areas like Machine Learning, Data-Centric AI (DCAI), and Storytelling with data. He is extremely excited about opportunities to test his knowledge and expand his industry insights, such as Kaggle competitions and visualization contests. John thrives on collaboration, enjoying the experience of working alongside talented colleagues and learning from their expertise. In his free time, John enjoys playing basketball, hitting the gym, and hanging out with friends. He is also a trading card collector. He enjoys making new friends, so be sure to come and say hi.