Normalization is a process of organizing data in a database to minimize redundancy and improve data integrity. There are several reasons why normalization is important:

  1. Minimizing data redundancy
  2. Improving data consistency:
  3. Reducing data anomalies
  4. Improving database performance:

To normalize a table, we need first know some concepts. suppose we have a table A (X, Y, Z, D), where X and Y are the primary keys. 

Partial functional dependency (Y → Z) a functional dependency of one or more nonkey attributes upon part (but not all) of the primary keyTransitive dependency (Z D) a functional dependency between 2 (or more) nonkey attributes

A table is in second normal form if it is in first normal form and all non-key attributes are fully functionally dependent on the primary key.

Consider a table of students and their courses (student_id, student_name, course_id, course_name, course_department). In this table, the primary key is the combination of student_id and course_id. However, the course_department attribute is not fully functionally dependent on the primary key, since it depends on the course_id. To bring this table to 2NF, we can create two separate tables: one for student information, with student_id as the primary key and student_name as an attribute, and another table for course information, with course_id as the primary key, along with the course_name and course_department attributes. The original table can then be modified to include only the student_id and course_id attributes.

A table is in third normal form if it is in second normal form and all non-key attributes are non-transitively dependent on the primary key.

Consider a table of customers and their orders, (order_id, customer_id, customer_name, customer_city, order_date, order_total). In this table, the primary key is the order_id. However, the customer_city attribute is transitively dependent on the primary key, since it depends on the customer_name, which in turn depends on the customer_id. To bring this table to 3NF, we can create two separate tables: one for customer information, with customer_id as the primary key, along with the customer_name and customer_city attributes, and another table for order information, with order_id as the primary key, along with the customer_id, order_date, and order_total attributes. This way, the customer information is stored in one table and the order information in another, and the customer_city attribute is no longer dependent on any non-key attributes.

The Data School
Author: The Data School