On Week 3 of our training, we were introduced to API’s. So, in this blog I will demonstrate how I used Alteryx to extract data from an API. The data covered in this blog is the list of Asteroids based on their closest approach date to Earth, from NASA’s website.
The Work flow used is:
The process is split into four parts:
- Finding the URL and API Key
- Downloading the data
- Filtering unwanted rows
- Crosstab and clean-up the data
1. Finding the URL and API Key:
The first step is to understand the API documentation provided on the website. From the documentation, it is essential to find the URL, API-key (if required) and limitation(s).
From the link we can observe that the URL (example query) for the data we are after is “https://api.nasa.gov/neo/rest/v1/feed?start_date=START_DATE&end_date=END_DATE&api_key=API_KEY”. The variables in the URL are Start date, End date and API key, which have been included in the text input files. The API key can be obtained by registering on the link. The limitation is, End date should be within 7 days from start date. I have included all this information to create the column “URL” using the formula tool (make sure the data type input coming into the formula tool is all string).
2. Downloading the data:
Input the “URL” column in the download tool to output string data.
Next, On the JSON Parse tool, select the dropdown “DownloadData”.
This should create two additional columns 1- JSON_Name 2.JSON_ValueString, in the results window.
3. Filtering unwanted rows
Split the JSON_Name column using the Text to columns tool, to identify the different headers. In this case the delimiter is a full stop.
Next, filter out all the unwanted rows of data (example, I have excluded rows with data in miles). I have used separate filters, to get only the relevant data. Thereafter, used union to get all the data in one file. (I have used multiple filter tools, thereby taking a step by step approach. This can also be done with a single filter tool)
4. Crosstab and clean-up the data
We can observe that by using split to columns, the relevant header names appear in more than one column. To get all the relevant names under one single column (JSON_Name4), I used the formula tool.
In the Crosstab tool, this column is selected in the “New Column Headers” field, the ValueString column is selected in the “Values for New Columns” field and is grouped by any field(s) that gives a unique row for each asteroid.
Finally select only columns that are required and use the auto field tool to convert the respective data types.