Welcome to Dashboard Week Day 2!

In this episode we were given an Olympics dataset which had the medal winners of over 120 years of Olympics and given free reign over the story we wanted to tell. Just like in previous years, this dataset was actually used in the last stage of applications for DSAU8.

As with any dashboard, the start is always the hardest. Even getting an IDEA of what I wanted to do with such a varied dataset with some pretty low details is difficult. The key to this dataset is supplementation. The dataset by itself is an okay framework for getting some inspiration but lacked the detail to actually come up with some fully fledged stories.

Finding a Story to Tell

With that being said, the first step was to pick a sport. There were some unusual sports in the Olympics over the years. It turns out, the most ‘competitive’ person in the Olympics (the person who entered the most events) was a guy named Robert Tait McKenzie who won a medal for…Art.

Unfortunately, I’m not a huge art fan so I chose Weightlifting. Mainly because I like weightlifting and also because there was some pretty comprehensive supplementary data available on IWF. From here its just a matter of trying to find the right graphs to represent the story I was going to tell.

In the last stage of the Data School applications, I created a viz on weight classes. Its only fitting I make another. In this viz, I wanted to see how the Olympic records of each weight class compare to each other. And then, compare that Olympic record to every other competitor in any weight class to see how much stronger the record holder is.

Finding the Data

By far the most difficult thing was finding the appropriate supplementary data, then sorting out that data to find the rest of the story I wanted to tell. The IWF was kind enough to make their data easily accessible and downloadable. So collecting the data was easy. But the thing with datasets from different sources is that none of the fields ever line up. In this case, the naming conventions of the IWF for the competitors didn’t match up with the original dataset. So I had to find a way to match up every competitor by name when John Smith appears as ‘SMITH John’ or ‘John K. Smith’.

This is a prime use case for Fuzzy Matching… But I have no chance of learning fuzzy matching in less than a day and also building a viz sooooo… looks like we just manually go through them. Orrrrr just ignore it completely and figure out something else.

Making the Viz

The Viz itself is fairly straightforward, its a scatterplot with a trendline and a jitter plot on the right. The real difficulty comes from having to link two different data sources based on similar but not exactly the same fields. Since calculations don’t carry over between data sources, the way I was able to link them together was with parameters. Parameters can be shared between data sources so essentially I let one data source set the parameter value, and the other data set use that parameter value for new calculations.

Things I learned

  • I have such a difficult time picking colours. There are palette tools online, but its still so difficult to find the right one because my colour sense is way off. So I just ended up using the Olympic colours.
  • FInding a good title is also difficult. No solution for that one yet.

 

Kevin Prescilla
Author: Kevin Prescilla

As a late-stage PhD candidate, Kevin’s appreciation for data analytics grew during his studies into poultry nutrition, or as he calls it, “chickens”. It was this appreciation which spurred his decision to change career paths and ultimately led him to apply to the Data School. In his spare time he enjoys powerlifting – ever challenging himself to beat his last max weight - as well as all kinds of gaming, from board to PC. If Kevin could go anywhere in the world, where would it be and why? Well, the answer is Antarctica, as he is fascinated with how people can live and survive down there (although some might argue because it’s the furthest place you can go on Earth from a chicken).