What can I find out about 500 US Cities?
I will be looking at a dataset from the CDC for Better Health 2019. The data includes 27 measures of chronic disease related to unhealthy behaviours, health outcomes, and use of preventive services from the years 2016 and 2017.
The Brief: Determine trends in US Health Behaviour at a census tract level
Optional – Bring in any external data sources
This dataset is very rich and provides an interesting view of health trends at a census tract level in the USA. There is a lot of measure fields and values to understand before being able to visualise.
The data values are in percentage values and have to be calculated into raw values. Some of the underlying data needs some cleaning and filtering. For example, the State “North Carolina” has a value of “North Carolin”.
I downloaded an ArcGIS Layer package file that contains all the Census Tract polygons that were not included in the original dataset. To make use of this file I had to unzip the package and drop it into QGIS. QGIS is open-source software that can handle a hand full of shapefiles. I then exported the layer as a shapefile to be processed in Alteryx.
I joined the Spatial file with the original file using the joining field “State_FIPS”. The rows containing “City” as the geographic level didn’t join. They were unioned back for the final output. Alternatively, I spatial match could have been performed to capture all the City rows and Tract rows.
The Spatial file also came with US Census population data which might be useful to normalise the data points.
Understanding the Data
The entire dataset had a large spread across multiple measures. For this reason, I decided to focus on sleep quality and what factors might impact it. Geographically, I set up a hierarchy for region, states, city and census tract.
A positive correlation was shown for obesity, smoking, mental health, physical activity and binge drinking.
I was having issues after joining the tract shapefiles with the 500 city data due to the file size of 1.1GB. To render better performance I will need to join to hyper extracts separately in Tableau.
Due to time constraints, I went up to state-level until I fix the performance issues.
The final visualisation can be accessed on my Tableau Public profile.
In the future, I would like to add some quick KPI’s with a highlight action, as well as a multi-level category tree. I’m not sure of the logic for the generation