Difficulty Level: Beginner (Alteryx/Tableau)
In this week at the Data School’s training program, we learned spatial analysis and how to obtain data via API & web scraping using Alteryx. I would like to share what I learned, especially connecting to an API and performing spatial analysis, by walking you through a simple example.
I’m sure everyone has heard of the Black Lives Matter protests happening in the US (and in Australia too) as well as posts about police brutality. While digging around the topics, I found this community data repository on Github about assaults by the police during non-violent Black Lives Matter protests. Although this is community data, I believe the dataset is trustworthy enough since it includes links to news articles and twitter posts, and even video. The dataset includes geolocation and provides an API to connect to, so it’s perfect for illustrating my post!
This is the final dashboard that we will create today, using Alteryx and Tableau.
1. Download data via API (Alteryx)
Although API isn’t the focus of this post, I would like to briefly go through it. The reason is that very rarely we have location data in a csv. A lot of the time it is stored in a database and can only be downloaded via API. Since this dataset is community data, it doesn’t have an API key, making it the most beginner-friendly API you can find. First we’ll need to go to the community data repository to find the endpoint of the API. You can see that there’s a link to RESTFUL API, where we can find the endpoint for a list of incidents, which is https://api.846policebrutality.com/api/incidents. Once you have obtained the endpoint, we can start the workflow to download and parse data into table format. Below is the list of Alteryx Tool that we’ll use and why.
(1) Text Input tool
Copy and paste the endpoint into the text input tool
(2) Download tool
Assign the endpoint column as the URL field. Output to a string. Alteryx will download all the data from the endpoint into a field called DownloadData.
(3) JSON Parse
JSON parse will parse data into a tall format. Regardless of how many variables or fields there are, there will be only 2 columns: JSON Name and JSON Value. The field JSON Name usually has values such as data.1.id or data.10.state. Usually, the number is the record id whereas the string at the end of the name of the field. As a result, we’ll need to split this JSON Name into multiple columns as it contains multiple information (variable names and record ID), using ‘.’ as the delimiter.
The crosstab tool transforms the data from tall to wide. We also want to group it by ID.
If there is a date field, usually it will be as a string format. If you forcefully use the Select Tool to transform it into a date, the value may not be accurate. The date field pulled from API is usually in this form ‘2020-06-23T00:00:00.000000Z’ so we have to use ‘Custom’ to specify the format. The configuration is as below. I can’t stress enough the importance of MM in capital letter, because using lowercase mm will produce inaccurate results.
(7) Select & Filter
After this process, you may want to filter out footnotes (e.g by filtering out rows without a record ID) and to select the columns that you only want.
2. Create a heat (density) map
(a) Using Tableau
Creating a density map in Tableau is very straightforward. I wrote about using spatial object in my previous post. First, you create a point using the MAKEPOINT formula. Second, drag the newly-created calculated field and ID into the Details. Finally, change the Marks Type from Automatic to Density, and change to the preferred color scheme. Voila, you’re done!
I didn’t use this method for this dashboard because there are some limitations. I cannot see the number of incidents in the clusters and it’s hard to get a list of incidents in the cluster. However, the advantage is that it is a very easy way to see the spatial distributions of events. Below is an example of how I used it to map the distribution of earthquakes.
(b) Using Alteryx
Although you can use Tableau to create a density map, Alteryx also offers another method to create a density map, using the Heat Map tool.
The advantage of Alteryx is that you can select the distance to create the map and the output is not just coloring on the map like Tableau but actual spatial objects. Another advantage is that Alteryx lets you see how it categorizes the density, with categories such as ‘Average (2.1-17)’ and ‘High (17-135)’, etc.
The downside is that after using the Heatmap tool, Alteryx creates only several spatial objects for each category of the heatmap, not for each cluster. To use these clusters separately with filter actions in Tableau, we’ll need to use Polysplit to split into clusters and assign a RecordID to each cluster.
Below is my workflow for the spatial analysis of this dashboard:
After completing the above workflow, you’re ready to build the dashboard. I won’t go through exactly how I built the dashboard because all the techniques I used here, I already wrote about them in my previous post ‘6 ways to enhance your Tableau Map‘. You can refer back to the post or download the Tableau workbook on my Tableau public while creating your dashboard if you’re stuck.
In addition, this is the link (Police Brutality during protest) to download the full Alteryx workflow package in case my explanation doesn’t make a lot of sense. From the dashboard, you can see that areas will a high population of black Americans also report high incidents of police brutality. However, this could just be because those areas have more protests. We can’t really infer much other than that there are lots of incidents/reports of assaults by police. I think it would be interesting to overlay the map of protests and this map together. Elephrame provides a list of Black Lives Matter protests (unfortunately without much details about the number of protesters) but does not provide an API. Once I figure out how to scrape their website (not as straightforward as I hoped it to be), I will be back for another blog post!