A butterfly chart is generally used to compare two datasets at a time over some dimension.

Recently, I used a butterfly chart in an online challenge and, while it looked nice, I noticed that it was quite difficult to read.

Below, we have population figures for age groups for the UK in 2021 broken down by sex. This data in a basic butterfly chart may look something like this:

Male populations are going left and female populations are going right. We can clearly see populations for each age group and sex and how that figure compares to the total population (indicated by the grey bars).

But, do you see a problem? It’s hard to see which age groups have more males than females and visa versa because the populations are very similar for most age groups. If you Google ‘butterfly chart’, you’ll see this is how many of them are formatted.

If we were to visualise the information in a regular bar chart (below), it is quite difficult to compare Male and Female for each age group; it’s simply too cluttered, too much visual information.

We may instead want to use a butterfly chart to reduce the clutter, but there’s still the problem I mentioned earlier of comparing two bars on one row. Luckily, there are two easy ways to get around this.

Step 1 – Building the Butterfly Chart

A butterfly chart is essentially a dual axis of two charts. On either side, it has the total population and the population for male/female. So, for each age group, we need 2 rows of data for each side in the dual axis – or 4 rows total.

Now, let’s separate these into the left and right side of our chart by creating calculated measures for both sexes.

Males – Total:

Females – Total:

Note the formula for Males – Total makes the number a negative so that the bars will go left instead of right.

Next, drag Age onto Rows, Males – Total onto Columns, and Breakdown onto Colour to get the left half of the chart:

In the top menu, make sure Analysis -> Stack Marks is turned off because we want the bars to overlap.

Now drag Females – Total onto Columns and right-click to create a Dual Axis chart – and don’t forget to synchronise the charts!

You should have a chart resembling the one I showed at the start, except for one little detail. The axis numbers.

I’ve got mine rounded to millions, but the issue is it’s showing the Male side as negative.

Right-click on the axis and select Format. For Numbers, select a Custom format and enter the following to show only positive numbers:

Now if your number is in the thousands or millions, you can simply abbreviate the number by adding a comma for each thousands, then adding a letter to denote T for thousands or M for millions. Here is mine for millions:

Step 2 – Method 1: Reference Lines

Now we are ready to make our butterfly chart easier to read!

The easiest method is to add a reference line for each cell (along the Age dimension).

First, I’ll create a calculated measure, just to consolidate my figures for Male and Female in one column. I’ll then create a total that only looks at Male and Female, discarding the Grant Total in my breakdown.

Population:

Population – grouped:

Finally, I’ll add a reference line to my x axis using the new ‘Population – grouped’ measure, making sure I use the sum along each cell.

Note: Since we are not using ‘Population – grouped’ in our chart, make sure to add it to Detail first. This ensures that it will show up as an option when creating the reference line.

Great! Since Males is a negative figure, so adding both in our ‘Population – grouped’ measure gives us our difference.

Of course, you can toggle the label off and just show the line to be neater. The main thing is we can now very clearly see whether Males are more numerous, and by how much when the reference line is to the left of 0 and visa versa for Females.

Step 3 – Method 2: Adding Marks to the Bars

The second way I’ve tried to enhance the butterfly chart is to add an extra mark to the bars.

To do this, we’ll need to further modify our data but adding 2 more rows per age group:

Notice that the new rows include the population figure for the opposite sex. This is because in our chart, for age group and sex we now want to show 3 different bars layered on top of each other:

1 – At the bottom, the total population

2 – Above the bottom, the opposite sex population

3 – At the top, the population for the current sex (i.e. Male for left bars, Female for right)

Here’s what our chart now looks like with this added detail:

We can clearly see the Grand Total (Total) population in grey at the bottom. Above is the population for the opposite sex. For example – for Ages 5 to 9, on the right (Female) side, we can see that Males (shown in a darker pink) has a higher population simply because we can see the bar. If they didn’t, their bar would be hidden under the Female (regular pink) bar.

Now, isn’t that better? Recall the very first chart I showed you and compare it to this one. It is much, much easier to see in the new, updated chart which sex has a greater population for each age group, and by how much.

And that’s it.

Now we can make butterfly charts that are packed with information and easy to read

.

The Data School
Author: The Data School