Yesterday marked the beginning of what promises to be a busy yet exciting week. I will be producing a dashboard everyday, along with a blog describing my process in creating it. For day one of dashboard week, we were tasked with using data from the Open Data website to create our dashboard. This website gives you access to dozens of APIs from various government agencies in NSW. (for more information about it, click here).
Prepping The Data
To download data from the Open Data website, I had to use Alteryx. I chose to download data from the NSW Roads Traffic Volume Counts API. This API has tonnes of traffic information going as far back as 2006. Here is my original workflow:
There were four datasets that could be queried from this API, and so I spent most of my time in Alteryx trying to join these datasets together to enrich my data. To download the data itself, you need to provide the URL as well as your API key. After that, it’s a matter of parsing out the relevant information (essentially translating JSON to something more easily digestible). The problem I discovered with this original workflow was that I had data at both the yearly and hourly grain, so there were a lot of duplicate rows coming through the left-hand side, not to mention the workflow ran very slowly because there were millions of records. To fix this problem, I decided to get rid off the yearly grain data. I also filtered the data down to April 2019 to enhance performance. Here is my final workflow:
Building The Dashboard
I started my dashboard by building a map with all the traffic counting stations in the Sydney region, sized by the average number of vehicles counted per day. I then thought I needed to drill down to a finer level of detail, and so included a hidden sheet which shows the number of vehicles counted over time when you filter to a particular counting station. Finally, I wanted to make use of the hourly grain, and so I created a heat map which shows the deviation from average vehicles counted by hour of day. The average is calculated for each hour, and the deviation is simply the difference between this number and the average vehicles counted in each local government area (I had to normalise this calculation because there were varying numbers of counting stations in each LGA). Here is an image of my final dashboard, which you can click to view it on Tableau Public with full interactivity:
What I Learned
The biggest lesson I learned was to do more data discovery before jumping into Alteryx. I think I would have been able to understand the data better, which would have given me more time in Tableau (I was still using Alteryx late in the afternoon). I may have even been able to find a richer data source that has a few more insights in it. Otherwise, it was good to consolidate my ability to download and parse data from an API.
Thanks for reading this blog, and make sure you come back tomorrow to see what I create.