Onto day 3 and time flies when you are having fun! Today we were given another large dataset containing ~ 3 million records of horse race results by Jockey, Trainers, and Horses in racing events around the world. The data was taken from Kaggle and the link is

Cleaning and preparing data using Alteryx
The first task for the day was using Alteryx to clean, prepare, and join data. The data was downloaded from Kaggle and contained records for horses and racing events from 1990 to 2020. Bringing multiple files for the horses and races into Alteryx was achieved by using the wildcard feature in the input data tool. This was done by adding an Asterix to the end of the file name, for example, horses_*. The next few steps included creating age categories for the race horses, categorising the prize winnings, joining the horses and races tables by race ID, working out where the horse finished in the race events, and creating the final file output. 

Tableau Dashboard
Now I can bring the file to prepare dashboards in Tableau. An introduction was provided to add context to the story which was finding all the race winners (Jockey, horse, and trainer) in horse racing events around the world from 1990 to 2020. Next, I created an interactive section where the user can explore the data to find the top N winners by decade and select either horse, jockey, or trainer. In the final section, a few interesting insights were presented. Winx, a champion Australian racehorse, was the most successful horse with the highest number of wins and highest prize money. For the jockeys, A P McCoy was the most successful with the most number of wins, and Frankie Dettori was the jockey with the highest prize winner. The trainer with the most wins was Mark Johnston, and the highest prize winner was A P O’Brien. And there it is, the final dashboard! 



The Data School
Author: The Data School