In week 3 of training at The Data School, one of the topics we learned was APIs. Through APIs we can retrieve even more data than before! We dived into an API of Disney characters thinking it would be a straightforward retrieval. An in-and-out 10 minute adventure…. Like most things involving data wrangling that was not the case.
In this Part I blog post we will go through the data retrieving process.
What is an API?
Before we delve into the process, let me review what an API is for those not familiar with them.
An API is a standard code allows different programs/websites/apps to communicate with one another. Kind of like how English is an international standard for communications these days. Currently, the more common method of communication is JSON (pronounced like “Jason”). The API looks like normal HTML coding. But a deeper look and we can see it contains the data we need.
Depending on the API we may require a key/token to access that data. In the case of the Disney API, we did not require any keys or tokens so the data was ripe for the taking!
Expectations for parsing the JSON API
With the API URL in hand, we could begin taking that sweet, sweet data. The general steps for retrieving data via APIs involves 8 tools in Alteryx: Input, Download, JSON Parse, Text to Columns, Filter, Select, Crosstab, and Output.
Every API is different and we may need to order the tools differently or add tools to the workflow. And we quickly found that the Disney API was a unique beast.
Disney API and pages
Hidden within the Disney character API were URLs indicating that there were multiple pages – 149 of them. We now had to factor this into the workflow and paginate the URLs to feed the Download tool. We achieved this with two tools: Generate Rows and Formula.
Connecting the Download and JSON parse tools
After sorting that out, we can now download our Disney character data. Here are the steps:
- Connect the Download tool and JSON Parse tool to your workflow.
- Adjust the settings of the Download tool to the “URL” column and the JSON Parse tool to “DownloadData” column.
- You can hit Run to run the workflow and download the data.
- Alternatively, when right clicking on the JSON Parse tool we can select an option to “Cache and Run Workflow”. (See picture below for further clarification)
In Part II, we will go through some other tools required in the workflow to manipulate the data into a more usable format.