Welcome to my fifth and final post of Dashboard Week!
In case you’re asking, what is dashboard week? Dashboard week is a week where, during each day, all the data schoolers create a dashboard and post about it! Friday’s challenge was creating a dashboard using data about 500 Cities Better Health Initiative.
The 500 cities project is a collaborative effort to track small area estimates for chronic disease risk factors, health outcomes, and clinical preventive service use for the largest 500 cities in the United States (see more). The dataset naturally fell into these three categories, so I wanted to create an exploratory dashboard that enables the user to visualise correlations.
In this blog post, I explain how I created the Correlation Matrix (Heat Map) using Pearson’s correlation coefficient.
Dealing with Percentages
One of the immediate challenges I found with the data set was all the values were in percentages. The issue with this is that the granularity of the data goes down to the city level. When looking at overall correlations (across cities), this presents problems as cities with smaller populations will create bias. As the data set contained population values and the percentages, converting to the raw numbers was an easy fix.
Pearson’s correlation coefficient
Pearson’s correlation coefficient will give a value (-1 to +1) of the linear correlation between two fields. Presenting this as a correlation matrix (heat map) makes it easy to see which fields correlate with each other. In short:
- -1 shows strong negative correlation (e.g. starving to death and having an abundance of food – when people have lots of food, they’re unlikely to starve)
- 0 shows no correlation (e.g. liking the colour blue and being able to drive a manual)
- +1 shows strong positive correlation (e.g. regularly going to the gym and being fit)
Prepping the Data
Using the tool in Alteryx requires the input data to be in a specific format. What we typically start with is something like this (left below).
Using the cross tab tool, we’re able to move it to the required format (right above). Note that in this process, it’s essential to average the values while aggregating the data (a setting in crosstab when the data type is numeric).
Using Pearson’s Correlation Tool
Using the tool in Alteryx, you want only to select the numerical fields you’ve just created (i.e. Av_this, Av_that and not City). With this done, the tool will work its magic and give you the correlation coefficients.
Bonus: If you’re working in Alteryx and don’t want to turn this into a Viz in and of itself, then use the Association Analysis tool which takes in the same format but creates a correlation matrix you can view with the browse tool.
Correlation Matrix for Tableau
The final challenge with Pearson’s in Alteryx is getting it Tableau ready. For this, we need Transpose. Key Column will be FieldName and Data Columns will be everything else.
At this point, we’re Tableau Ready!
With the data in the correct format, creating the correlation matrix in Tableau is as easy as moving one field to rows, one to columns and putting the value onto colours. It’s then best to fix the colour with the min set as -1 and max as 1. As there were a lot of fields, I parameterised the fields to reduce the amount of information on the visualisation.
As always, thanks for taking the time to read my blog! If you have any comments, suggestions or want to chat, free to connect with me on my LinkedIn!
~ Ryan Edwards