When you start working with data, you might be tempted to begin creating beautiful Tableau dashboards and Alteryx workflows, without properly looking at the data set. But that’s not the way to go. Before jumping into it, it’s very important to understand the nature of your data set and it’s data types. Data types is a very important concept in statistics and data analytics. You have to make sure you understand them correctly in order to be able to apply statistical measurements to your data and therefore to being able to arrive to the correct conclusions and insights.

In statistics, there are two data type groups. Categorical/Qualitative and Numerical/Quantitative. In this post, we’re gonna go through the first one.

Categorical/ Qualitative Data Types

 

Categorical data represents characteristics of a group. For example gender, nationality, movie genres, eye colour, etc. It can be of ‘word’ type, like male/female or numerical like group number 1, 2, 3, etc. The numeric values in the nominal data are just a representation of the label without any actual numerical property and they don’t define any order.

There are two types of categorical data:

 

Nominal

 

Nominal data is defined as data that is used for naming or labelling variables without any quantitative value. The nominal data cannot be treated using mathematical operators (you can’t add apples to cherries, for example) and none of them has any numerical significance. With nominal data, we can calculate the mode/frequency of a value.

 

Ordinal

 

The most important and significant element of ordinal data is the order of the values. The only point to be considered here is that differences between each one is not really known. An example is the answers to the question “How happy are you with the product”: very happy, happy, unhappy, very unhappy. We don’t exactly know the difference between very happy and happy. This is also known as the Likert scale and we usually measure non-numeric concepts like satisfaction, happiness, discomfort, etc. You can also label the above responses as 1, 2, 3, 4, 5 and, in this case, unlike in the nominal data, the order does provide some meaning to the numbers. The numbers here represent labels and order without any numerical properties.

Calculations 

 

Categorical data can be counted, grouped and sometimes ranked in order of importance. Information can be placed into groups to bring some sense of order and we can calculate a frequency or central tendency.

 

Frequency

 

To organize a data set, you can create a frequency distribution table to show the number of values for each category. You can find the frequency for both nominal and ordinal data types.

  • Frequency is the number of times an event occurred.

  • Relative Frequency is the fraction of times an answer occurs (frequency divided by the total count of events).

  • Cumulative Relative Frequency is the accumulation of the previous relative frequencies (add all the previous relative frequencies to the relative frequency for the current row).

To understand this better, let’s use an example. Let’s say we ask 50 students how many books they carry, and we have the below answers:

Number of books

Frequency

(Number of students)

Relative Frequency

 (Number of students /

Total Students)

Cumulative Relative Frequency

1

11

0.22

0.22

2

10

0.20

0.22 + 0.20 = 0.42

3

16

0.32

0.42 + 0.32 = 0.74

4

6

0.12

0.74 + 0.12 = 0.84

5

5

0.10

0.84 + .10 = 0.96

6

2

0.04

0.96 + 0.04 = 1

Total

50

1

 

Central tendency

 

The central tendency of your data set tells you where most of your values lie. We can calculate the mode for both nominal and ordinal data, and the median for the ordinal data.

  • Mode is the most commonly observed value. In a pie chart, it’s gonna be the biggest piece.

  • Median is the middle number in a data set. It cuts the data in half. Half of the numbers are smaller and half of them are bigger than the median.

 

In this post, we went through the importance of understanding data types and we explained the categorical types. In my next post, we’ll go through the Numerical/Quantitative Data Types.

 

Resources

 

Dana Voroshchuk
Author: Dana Voroshchuk