People are always trying to find a specific relationship between things in their daily lives. For example, is there a relationship between smoking and longevity? Will education level affect income level? For problems like this, correlation analysis is a commonly used statistical method to explore the relationship between variables.
In daily analysis work, we often encounter analysis of the correlation between continuous variables (numerical values). The most commonly used statistical method is Pearson’s c The correlation coefficient (r) is between -1 and 1; the closer the absolute value is to 1, the stronger the correlation. The positive and negative means the positive correlation and the negative correlation. In addition, there are Spearman’s correlation coefficient and Kendall’s correlation coefficient. These two correlation coefficient application conditions are different; here, I focus on Pearson’s correlation coefficient.
In this case, I will use Alteryx to calculate the correlations between IOWA liquor sales data (https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-qhgy) and IOWA demographic data(https://www.iowa-demographics.com/); and then visualize the correlations in tableau
Two methods to explore and demonstrate the correlations
- Graphical observation method: determine whether there is a certain correlation between the two by plotting a scatter diagram
- Scientific calculation method: by calculating the correlation coefficient r
Alteryx workflow to prepare the data
- Join the above-mentioned dataset together
2. Use the joined dataset as input and connect it to the Pearson correlation Tool to calculate the correlation coefficient; Here, I also add an optional normalization step before the calculation. To normalize the dataset, I use python code and Sklearn Package to achieve it
3. Restructure the dataset for the best visualization practice. I output two dataset structures, and one is the narrow data for making the heatmap, another one is a wide table to make the scatter plot
4. Demonstrate the correlation in the tableau by heatmap and scatter plot.
5. Summary: By observing the color of the heat map, we can quickly determine that there are two strong positive correlations, one positive correlation is between liquid sales and the black population，another one is between liquid sales and population with age female 20-40 years old.