One of my favorite things about Alteryx is its allows us to automate a repetitive process very easily. This blog takes you through a data set that I otherwise would not have been able to generate in a reasonable time frame. Prior to Tableau, I don’t think I would have investigated it any further than maybe some sub-totals and averages. With Tableau I could instead make a few dashboards that allows me (and any other user) to explore many possible combinations.
This week at The Data School, we did a session on web-scraping and API’s. One of the activities was to download a csv file for each Local Government Area (LGA), each similar in structure, containing crime data in 2018. In class we used a batch macro to download and store all 132 files. I thought this was a perfect data source to practice my Alteryx skills on, and possibly find some insightful or useful data.
This blog will take you through my thought process and dashboard that compares what, where and when crimes are being committed in New South Wales (NSW).
Getting the right Sheet from the right Documents
One cool thing that I learned when we were downloading these files, is that you can specify which sheet you want for each excel file by using a ‘|||’. Thankfully all the sheet names for each of these docs were the same ‘Time’, so I didn’t have to spend much time on this. I also used a wildcard (*) within the input data tool to get all the documents I needed. All files were also in the same folder.
Arranging the Data to be in a Desirable Structure
What is frustrating about many excel documents available to download, is contextual information at the top of each sheet. This can include merged columns, empty rows it makes it difficult to determine the header locations. Pictured below is an example of the structure of one of the sheets.
To solve this problem, I initially worked on one sheet only, and figured out how to get that in a desirable structure. I then did the same for the rest. What was advantageous about having one sheet worked on separately, was that I was able to dynamically rename the headers for each field for that table, and then make sure that was on the top of the set order in my union.
After I unioned all the data and removed unwanted columns, I used an output tool to create a new ‘.yxdb’ file, I then put all of my workflow up to this point into a tool container, and disabled it. Then added a new input tool calling the new ‘.yxdb’ file. This way, I could re-run and make changes to the following steps without having to read 132 files every time. This sped up my workflow run time considerably. Caching could be an alternative way to do this.
I used a pivot tool to re-arrange all the different time fields into two fields (value and time). This made the data much easier to work with when I passed the data into Tableau.
Making a Tableau Dashboard that responds to user input
For this project to be worth it, it would ideally be used by multiple people for their own personal inquiries. I made both dashboards responsive to changes LGA and Offence Type selections. This can be done with filters and set actions, but I decided to use parameter controls and combined sets to achieve this. Below is an image of my first dashboard, if you click on it, it will take you to the dashboard posted on Tableau Public:
This dashboard is designed to show the user the LGAs with the highest count of a selected Offence. If the user selects an LGA, they can see how that LGA stands in comparison to the others. The next chart shows the total number of offences that were recorded within the selected LGA at different points in the day.
This dashboard shows all crimes that occurred within the selected LGA, so you can see which crimes are more prevalent in a given area.
I think these dashboards would be useful to see when and where certain crimes are being committed and gives the user some insight into how to avoid or prepare for certain crimes.
Finally, I would like to include a little disclaimer that LGAs can vary greatly in both geographical and population size, and I have not adjusted any of the values accordingly. Large numbers for some LGAs may not be all that significant if these variations were factored in.