Hi data enthusiasts, welcome back to my blog. In this episode, I would love to show you an amazing visualization that I have learned recently. And as you saw from the title, our special guest today is Pareto Chart.

What is Pareto Chart?

Basically, a Pareto chart is a special variation of a bar chart. This chart will be used to identify categories that we should focus on first in the improvement process. Pareto charts show the ordered frequency counts of values for the different levels of a categorical or nominal variable. The charts are based on the “80/20” rule. This rule says that about 80% of the problems are the result of 20% of causes. And from that, we can understand the cause of the problem that we are facing, then we make an improvement base on that.

Let’s build a Pareto Chart!

Again, we going to use the superstore dataset, since we will all have access to this. In this blog, let’s test a hypothesis: Does 80 % of sales of our stores come from just 20% of customers? Strictly following Pareto’s law is a good starting point to get along with this chart. So, jump in and here we go!

Step 1: Drag Customer Name to columns and Sum of Sales to rows

Step 2: Sort Customer by Sum of Sales (descending order) and fit the chart to ‘Entire View’

Step 3: Add table calculation to the Sum of Sales

Right-click Sum of Sales then click on Add Table calculation. In this case, we will use 2 table calls which are Running Total for the first one and Percentage of Total for the second one.

Then we will end up with the customer name in columns and the percentage of the running total for Sum of sales in rows.

Step 4: Ctrl + Drag Customer Name to Detail + Change Customer Name in Columns to Count Distinct of Customer Name

Step 5: Add Table Calculation to Count Distinct of Customer Name

Oh no we lost the curve! It’s okay, I will get the curve back right away. Now, we will apply the same table calculation configuration that we have done before for the sum of sales on the count distinct of Customer Name. Also Running Total for the first table calc and followed by Percent of Total for the second table calc. Make sure that all your table calculations are computed using Customer Name.

After that, we change the mark type to Area to get back the curve we want.

Step 6: Add constant reference lines to the chart

As we are following the Pareto rule, therefore we will add 2 reference lines to the chart. For the axis of sales, we right-click the axis, click on add a reference line, change the value of the reference line to Constant, and enter 0.8 (stands for 80%).

For the other axis, we do the same thing, however, the constant value for this reference line will be 0.2. And there you have the Pareto Chart:

And here we go, a new friend has been added to our visualization vocabulary. Stay cool and I will see you guys in my next blogs. Peace and Love!



The Data School
Author: The Data School