Introduction
This Week in the Dataschool Down Under, we bean Dashboard week. This involves creating a viz from scratch, using some of the skills we have learnt over our 4 months of training. Plus we only have 1 day to do it! For day 1, we focused on APIs, specifically the Spotify one. For those who don’t own Spotify, like myself, it is a great tool to be able to extract information about music, albums and artists.
I wanted to focus on a specific aspect of a band or artist, so I chose one of my favourites, the Red Hot Chili Peppers (RHCP). Specifically, I wanted to know what the impact of John Frusciante was on the band. For some context, Frusciante was in RHCP from 1988 – 1992, 1998–2009, 2019–present.
I previously made a viz on RHCP here, so I knew somewhat what my insight would be, I won’t go through it here, but will include it at the bottom in a TL;DR.
Preparation
To prepare the data I needed to connect to the Spotify API, (Many thanks to Jonathon Cavalieri), this was done by using the documentation found here. I thought originally that I would analyse the top songs of RHCP, however the API only limits the user to 10 songs. So another method I considered was to ask the API for the tracks in each album and then through a separate download tool, I would get the information for each of the tracks.
So to begin with I had something like this
I begin by using a text input to set up my client id, secret, endpoint, and grant type. After using various tools to just get the authorisation code which will be used throughout the workflow, I then join this code with my RHCP album id, which can be found at the end of the URL in Spotify.
next:
After I append my authorisation code to the album url and id, I download the information of all tracks on that album. I parse out the name and id only.
The above picture looks quite scary, but all I’m doing is using 2 text inputs with the audio features and track info url that can be found on spotify to download the song information.
I use a find replace tool to replace the id with the actual name of the song and a multi row formula to put each song and its information into groups so that it can be cross tabbed. I output the result. I do this for every album, I could Have made it into a macro, but due to the size of and timings of when I was running this workflow, sometimes the API reached its limit and failed. So I found it more comforting to go through all 11 albums to make sure it was all correct.
Here is the final workflow. In principle, the same method would work for any artist for which there were many tracks over many albums.
Findings
Here is the Viz. In my previous viz, to do with RHCP I found that when John Frusciante was part of the group, that was when the most popular music was written. In the new viz, I found that of the top 20 songs, (codified using my old viz of songs on tour statistics), 19 of them were written in the Frusciante era. That to me was staggering. In the new viz you can really see the impact he had leaving and then joining again. I do not see any real link between the different variables Spotify has to offer when I break it down into Frusciante era and non-Frusciante era. Going forward with this, I would love to do a word association analysis, to see if there were more unique words or more complex words said.
TL;DR
I used Alteryx to get info from the Spotify API. Before I found that a lot of the great RHCP songs were done in the era of John Frusciante. I found similar findings here, but not a lot of reason as to why.