The final day of Dashboard Week, Day 5, arrived with a whirlwind of activities. With presentations for Day 4’s dashboard in the morning and a new dashboard to be created and presented by 3 pm, time was indeed of the essence. For this day, we were assigned a dataset on prison populations in the USA. Comprising just three rows of information—location, population, and time of the population count—the dataset’s simplicity posed challenges in idea generation, particularly given the tight timeframe.

Step 1: Data Cleaning
I began by loading the data directly into Tableau to assess it, only to find that the overall population data was inconsistent. Further investigation revealed that the dataset was incomplete, with prisons appearing and disappearing, rendering any time series analysis unreliable. To rectify this, I used Alteryx to eliminate prisons without complete data, reducing the number from 1200 to 110. The result was a smoother curve representing prison population over time.

Step 2: Finding the Theme – Overcrowding
While exploring potential dashboard themes, I randomly researched some prisons online. I discovered that several had populations exceeding their official capacities. This revelation led me to focus on prison overcrowding, a significant and controversial issue in the US.

Step 3: Gathering Capacity Data
The challenge was that the dataset did not include prison names, only locations. I had to assume that these were county prisons based on the given information. I found a website where prison capacities could be looked up by name and identified how to modify the URL with the available data. After testing this method and ensuring that scraping was legally permissible, I examined the webpage’s HTML to determine how to extract the capacity information.

Step 4: Building the Scraper
Though creating the scraper in Alteryx might have been more maintainable, the time constraints led me to build it in Python, which I find more suited to such tasks. After constructing and debugging the scraper, I set it to run during lunch. For some reason this code only worked in debug mode and I’ll have to have a look at it another day.

Step 5: Assembling the Dashboard
Upon returning, I found that the scraper had only managed to obtain capacity data for about 10% of the prisons, as URLs did not exist for most prisons. With no time to seek alternative sources, I pressed on, focusing primarily on content rather than aesthetics.

Conclusion
The final dashboard, though visually basic, hopefully offers an insightful view into prison overcrowding in the USA. The experience of Day 5, from data cleaning to building a web scraper, was both demanding and rewarding. The ability to pivot from an initial challenge to a meaningful topic under severe time constraints showcased adaptability and determination. While not every aspect went as planned, the creative process and the final product underscored the essence of what Dashboard Week was all about: innovation, problem-solving, and relentless pursuit of knowledge.

Samuel Goodman
Author: Samuel Goodman