Image by Clarence Alford from Pixabay


Today’s topic was horseraces and racehorses from 1990 to 2020. The source was a this Kaggle site. There were many different approaches I could have taken with this data. This was a challenging day, partly from fatigue (this week is supposed to test us!) and partly because I know very little about horse racing.

Anyway, saddle up, and get ready to embark on an exhilarating journey into the realm of horse racing data analytics.

Data Investigation

The data was in the form of tables for each race and year and tables of horses that races per year, which I linked using the race id. It was important to narrow the scope given the breadth of the data. One of the first things I think about when I think of horse racing is gambling. Unfortunately, the data didn’t include information on the betting odds, so I ruled out that analysis.

The next thing I was curious about was whether horses were becoming faster over time. After some initial exploration in Tableau there did not seem to be an obvious trend. I thought this might be because the scale of the data was so broad, with many different event types and classes. But it was difficult to filter through these and obtain a meaningful subset of the data as there were thousands of race titles in data.

In the end I settled on analysing the fastest race times and a calculated average speed of the winner and compared this across race lengths, focusing on some common lengths.

I have to say I struggled with this dashboard and think it could be improved in terms of formatting.

To Do:

  • Add colours to first two charts and background colours.
  • Include more text explanation. Longer distances were faster than I expected. Make that clearer. Include distances in km.

Was this dashboard interesting for you? Or are you knowledgeable on horseraces and have some suggestions? Feel free to comment.


The Data School
Author: The Data School