Today is day 1 of the dashboard week. Our mission is to choose an API, query from it, create a dashboard, and write a blog all in one day. The topic that I decided to choose is Whale Sighting, from the Whale Museum’s API in the US. This post will outline the process behind my dashboard “Meet the Southern Resident Killer Whales”. The process is broken into 3 parts: (1) How to query from Whale Museum’s API using a batch macro, (2) Data investigation + finding extra data, (3) Visualisation in Tableau.

Alteryx’s Level of Difficulty: Intermediate (knowledge of cleansing tools, spatial tools)
Tableau’s Level of Difficulty: Easy

How to Query from Whale Museum’s API using a Batch Macro

I found my dataset on Whale Museum. Thankfully, they made the process of getting data from the API very simple. The reason why a batch macro is often required for querying large data is that there is always a limit to how many records you can query (usually 1000). For this API, there are a total of almost 19,000 records with 1000 records limit per query. As a result, using a batch macro running the query 19 times will allow us to obtain all the records. I initially query different locations separately (find the geocoordinates and run the query to find sightings 5 miles radius from the location). However, it turns out that this method will generate lots of duplicates and also very slow as lots of popular whale locations don’t return any results.

To create a batch macro, you need to create 2 workflows. One workflow for the macro and another workflow for running the macro and cleaning the data. I wrote briefly how to obtain data from an API and cleaning it in the past, so in this post, I will only focus on the batch macro. If you’ve never heard of macros before, I recommend checking out this blog post from my colleague, Pat. He wrote very in-dept about macros.

Workflow for the batch macro

According to Whale’s Museum’s documentation, you can specify the page number in the query. The endpoint will be something like this, for the first page, replace {PAGE_NUMBER} with 1, and so on.

http://hotline.whalemuseum.org/api.json?limit=1000&page={PAGE_NUMBER}

The workflow (download macro workflow) for the macro is very simple, only for downloading the data from page 1. The tools used include:

Text Input: for inputting in the endpoint
Control Parameter: for updating the endpoint in the text input (we use only the endpoint for page 1 so we need to update for page 2, 3 and so on, after finishing querying page 1)
Download: for download data from the endpoint
Macro Output: for outputting data

You should also press Ctrl + Alt + D and change to this configuration. Without this step, there may be errors due to the results returning different schema.

Workflow for running the macro & cleaning the data

To run the batch macro, we need to (1) generate a list of endpoints to feed through the batch macro, (2) add the macro, (3) cleaning the data

I used generate rows tool to generate a column with values from 1 – 19 for page number. Then, use a formula tool to replace {PAGE_NUMBER} in the endpoint with the generated page number. You can download the workflow here).

Data Investigation + Finding Extra Dataset

The next step I did was data investigation and research. I noticed that most of the sighting from the API is only for southern resident orcas, so I started researching on them (I had no knowledge of whales whatsoever.). It then made sense why only the sightings of southern residents were meticulously recorded. They are the only endangered orca species. Upon reading this, I decided to build my dashboard and story around the southern resident killer whales, filtering out other sightings.

Another problem I had was that the dataset is too limited. It only records the location of the sightings. This was too limited so I tried to find other information, such as the population of the southern resident whales and so on to have more context. Unfortunately, I couldn’t find a dataset for the population. However, there are some graphs and articles about the current population on Whale Research so I was able to incorporate it into my dashboard.

Still not wanting to give up on finding an extra dataset, I persisted and stumbled upon the critical habitat protected under the Endangered Species Act for southern resident killer whales on NOAA. This is a shapefile and spatial joining on Tableau crashed my laptop so I went back to Alteryx. Here, I learned some good tricks for joining large spatial files.  A simple spatial match would result in an output file with 5+ GB. The reason for this is that the shapefile of the habitat is 2.2 MB and there are about 16,000+ records of whale sightings. A simple spatial join would generate a spatial object for each row, creating a massive output. No wonder why my tableau crashed… To overcome this, I unticked the spatial object for the habitat in the select tool but keep the area number for the habitat (to join back to shapefile). I then join the joined version and the shapefile in Tableau using the area number. This improved performance significantly and the files’ size are so much smaller too.

I also considered adding southern resident killer whales’ main diet, chinook salmon’s population, and their habitat to my dashboard. However, due to time constraints, I gave up (it was probably overkilled anyway).

Visualization in Tableau

The final step was visualization in Tableau. After data investigation in the above steps, I knew my story would be on how the southern resident killer whales are endangered. Thus, I compiled some key stats & information on how difficult the life of a SRKW is and displayed on the dashboard. As I could not find the population dataset, I used the time series of the sighting to substitute. We can see that the number sightings decrease over time, a proxy measure to a declining population.

Due to time constraints, I was not able to create a fancy Mapbox map. Nevertheless, playing around with the map layers created a good enough map. To create a similar map, use a dual-axis, one axis on the map mark (for habitat), and another one on the density mark (for sightings). Most of the sightings are in the protected habitat areas. It’s interesting how Tableau doesn’t automatically generate a legend for density mark. I had to manually create one. For the map layer, here is my setting under Map > Map Layers

Conclusion

The majority of the time spent was on data investigation, rather than building a dashboard. I spent quite a bit of time reading articles about killer whales to have some context on my dataset and also to create a story. Another major time-consuming part was finding extra datasets to supplement the sighting dataset. Nevertheless, throughout this process, I learned a lot about the southern residents and had fun looking at their beautiful pictures!

 

 

The Data School
Author: The Data School