Day 4 of Dashboard Week was particularly fun. We had a huge data set with nearly 6 million rows. The data this time came from iNaturalist – online social network for sharing biodiversity information throughout the world. The data itself wasn’t providing much opportunity for analysis, so it was a perfect opportunity for experimentation.
Couple of weeks ago with Leigh we created a Sankey chart for a client as part of the project week at the Data School. It was risky, but we were extremely delighted to see that the client was very pleasantly surprised. I thought doing another Sankey would help the knowledge sink in – and it truly did, so I went for it during day 4 of dashboard week. Additionally, this time I got to get even more creative since I wasn’t limited with any particular requirements.
I’ve found some videos on how to do Sankeys and I decided to use this one. Additionally Craig suggested using another one which I hope to try out on a different occasion. In a nutshell the process looks something like this:
- First, I unioned the data set to itself. I had to be careful with this since it doubles the number of rows and it opens a possibility for causing troubles in Tableau later on in terms of performance. As I mentioned – my data set had nearly 6 million rows – so I decided to do a filter on a data source level and only grab data for Europe. To do this I had to create a group in Tableau for the countries that are in Europe and then use that group as a filter in the data pane.
- Next, I created a bunch of calculations – rankings, curves, sigmoid, paddings… These are telling Tableau how to plot the data and create the lines that are connecting the both sides of the chart. The video provides good explanation on each of these calcs.
- One of the interesting parts was creating the nested table calculations and by defining them one by one the Sankey started to take its shape.
- I wasn’t satisfied with one Sankey, so I decided to go with double sankeys. That’s when I started to create the one on the left on my dasboard. I figured out that there is an option that allows for the data to come from one point instead being branched out at both ends. This is achievable with making proper adjustments in the nested table calculations, actually, by reordering the computing order of the calculations.
- After I had the middle part of the Sankey ready, I did the branching parts on the ends. They are a basic stacked bars.
- Finally, with some formatting and adding additional bar charts, I got the final look of the Dashboard for iNaturalist.
For me Sankeys are one of those charts, along with radial ones, that look brilliant. Still they are only perfect for certain type of data sets, and probably the one I had wasn’t the best of all choices. However, it was a good learning opportunity for me and hopefully this blog post will help someone else on their journey as well. While waiting for a better dataset for the next challenge, I will let this one shine on my Tableau public ?. Click here to open the interactive version of the final Dashboard.