As mentioned in my previous post, data types is a very important concept. You have to analyze numeric data differently than categorical; otherwise, it will result in a wrong analysis. Therefore, knowing the types of data you are dealing with allows you to choose the correct method of analysis.
In this post, we’ll go through the numeric data types (discrete and continuous), which are the result of counting or measuring.
Discrete data types represent information that can be categorized into a classification, and they are the result of counting. For example, it’s the number of students of a given ethnic group in a class, the number of books on a shelf, the number of times we get 6 if we roll the dice, population (whole number), etc.
Continuous Data is the result of measuring. Therefore, their values can’t be counted, but they can be measured. Examples of continuous data are heights, weight, size, distance, temperature, etc. Continuous data have two groups: interval and ratio.
Interval values represent units that are ordered and have the same difference between them. For example, rulers measure the same distance between the two marks. With interval data, we can add and subtract, but we cannot multiply, divide or calculate ratios because there is no true zero. Nevertheless, we can calculate the mean, mode, median, etc. Other examples of interval data are SAT scores and IQ tests, as they can’t have zero value.
Ratio values, same as interval values, are ordered units that have the same difference between them. The difference with the interval values is that they do have an absolute zero, they can be multiplied and divided, and they can’t be negative. Good examples are height, weight, length, etc. For those variables, the mean, median, mode, etc. can be also calculated.
Let’s go through the most common calculations:
Mode is the most commonly observed/repeated value. In a pie chart, it’s gonna be the biggest piece.
Median is the middle number in a data set. To get the median of a data set, you have to be able to order values from low to high. And then if we cut the data in half, half of the numbers are smaller and half of them are bigger than the median.
- Mean (average) is the central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values.
- Standard Deviation is the average distance of a group of numbers from the mean of those numbers. If the numbers are further from the mean, there is a higher deviation in the data set; therefore, the more spread out the data is.
- Percentiles are a way to determine the relation of a particular value in a data set to the rest of the values. For example, if in a test you fall in the 90th percentile, it means that 90% of the students have the same mark as yours or lower, and 10% of the students have a higher mark. In general, the value of the percentile means that X percent of the data lie at or below that point and (100 – X) percent lie above it.
- Correlation shows whether and how strongly pairs of variables are related. For example the relationships between weight and height or sales and profit.
The table below is a quick recap of what we went through in the two previous posts. Make sure you understand those concepts correctly to be a successful data analyst.
blog/interval-data/#:~:text= Interval%20data%20is% 20measured%20on,Fahrenheit% 20and%2070%20degrees% 20Fahrenheit.&text=For% 20example%2C%20Object%20A% 20is,a%20possibility%20in% 20interval%20data.
blog/ratio-data/#:~:text=An% 20excellent%20example%20of% 20ratio,be%20negative%2C%20as% 20stated%20above.
- The header icon is designed by pch.vector / Freepik</a>