The Pareto Principle
The Pareto Principle, also known as the 80/20 rule or the law of the vital few, basically states that 80% of the consequences come from 20% of the causes. This principle was named after the Italian economist Vilfredo Pareto, who illustrated that 80% of the land in Italy was owned by 20% of the population. Used to highlight the fact that the minority owns the majority, this rule mainly serves as an observation, rather than a definitive law.
The Pareto Rule can be applied broadly in many business settings, thus giving rise to the use of Pareto Chart in Excel, and of course Tableau. Put simply, a Pareto Chart is just a combination of bars and a line graph. The bars represent individual items sorted in descending order (from tallest to shortest) according to a selected measure, while the line is typically the cumulative total of the selected measure.
Fig. 1: Example of Pareto Chart in Excel (Link)
There are many use cases of Pareto Chart in the business world. A typical example is defect detection. For example, a mobile phone manufacturer may use this kind of chart to understand the components that are showing the most defects in the production line. Having this insight then enables the company to direct its valuable resources to the most significant areas. Another use case may be finding out the customers that are generating the most profits for a company, so the company then knows who to target in future sales and promotions.
How to create it in Tableau
Creating a Pareto Chart with 80/20 reference lines in Tableau can be somewhat tricky in Tableau as it involves a good understanding of table calculations and the sorting mechanism. But once you get the hang of it, it only takes a few steps and couple of minutes to re-create one, given any valid combination of a dimension and a measure. Below I will show you the key steps involved and important areas to pay attention to so that you can get this chart right. The data set I’m using is the sample superstore data in the 2020.2 version of Tableau. You might get slightly different numbers if your version is different than mine but the chart should overall look very similar. Using this data set, I would like to show that roughly 20% of the highest-selling cities contribute to 80% of total sales in the US.
Drag City and Sales onto the column and row shelves respectively. Aggregate Sales by SUM, and sort city by sum of sales in descending order. Change the view from Standard to Fit Width so that you can see all the cities.
Fig. 2: City by Sum of Sales
To find where 80% of total sales sits, we can make use of the Running Total and Percent of Total table calculations. To do this, click on SUM(Sales) –> Add Table Calculation –> Choose Running Total, Compute Using City, and at the same time check Add secondary calculation, select Percent of Total, Compute Using City.
Fig. 3: Table Calculation for Sum of Sales
If you’ve done this correctly, you should see the Y-axis turn into percentages, and the graph is already shaped like a Pareto Chart. Essentially the chart is made up of a series of bars, each representing a city. This should become very obvious if you click on the chart. In your graph, you may see circles instead of bars. If that’s the case, you can use the Marks card to change your chart type to Area.
Next comes the most tricky part of building a Pareto Chart and is also where most people will mess up the chart. The X-axis now is still showing individual cities. But how do we know where the top 20% of cities sit on this axis? To do this, we need to convert City from Dimension to Measure, so that we can count the cities. But before doing this, we first need to drag City to Detail on the Marks card. Control (Command for Mac) and drag City to Detail. This duplicates City as a dimension.
Recall that our table calculation for sum of sales is computed using City. This calculation will break if we change City to measure without first duplicating it. Once we have City on Detail, we can change our City of the column shelf to Measure, Count (Distinct). You should see a vertical line at X=1 once you do this, as follows.
Fig. 4: Distinct Count of Cities
The vertical line is actually a bunch of points, and for every city, Count (Distinct) returns a value of 1.
This step is very similar to Step 2, only that this time the Running Total and Percent of Total are calculated on the X-axis. Click on CNTD(City) –> Add Table Calculation –> Choose Running Total, Compute Using City –> Check Add secondary calculation –> Choose Percent of Total Compute Using City. Now the X-axis should change to percentages as well.
Fig. 5: Table Calculations for Cities
If your chart doesn’t look right, check you City dimension on Detail and make sure the sort is by descending sum of sales.
In the last step, we add our reference lines for both the Y-axis and the X-axis. Right click on the Y-axis –> Add Reference Line –> Change Value to be Constant, give it a value of 0.8, and set Label to be Value. Do the same for the X-axis except this time use a value of 0.2. After this, we will have a Pareto Chart with reference lines on both axes. In our case, it looks like the top 20% of cities have contributed over 83% of total sales.
Fig. 6: Adding Reference Line
Fig. 7:Final Pareto Chart
As mentioned at the beginning, the Pareto Principle is more of an observation than actual law and only applies to certain situations. Sometimes you may see 20% of the causes are responsible for 90% (or just 50%) of consequences. But technically, by following the above 5 steps, you should now be able to build a Pareto Chart as long as you have a reasonable Dimension/Measure combination. Now that you’ve learned the steps, why not build one of your own?