Halloween Day of 2022 marks the start of Dashboard Week for DSAU16. For the start of dashboard week, the goal is to prepare data and create visualisation from Melbourne Cup data.

Data Understanding

Since the provided data came from different year, there were different documentation styles across different tables. Hence, evaluating table structures becomes important to produce a data that can be visualised. Differences between table structures from different years can be seen in figure 1-1 and figure 1-2.

Figure 1-1. First 3 rows of Melbourne Cup results table for year 2012

Figure 1-2. First 3 rows of Melbourne Cup results table for year 2021

Differences in table structures can be found between figure 1-1 and 1-2. For instance, 2012 table has 9 columns, while 2021 table has 10 columns. Since tables from year 2012 to 2021 were used for this challenge, processing the tables into consistent formats is essential.

Data processing

Although most of the used tables have different structures, there are still some structure similarities for several groups of years. In the dataset collection, the dataset are grouped to between 2012-2017, 2018-2019 and 2020-2021 sets. With these groupings, 3 different Alteryx workflows were made to process all used dataset. Figure 2-1 to figure 2-3 show the workflows for the 3 different year groups.

Figure 2-1. Alteryx workflow for Melbourne Cup data in year 2012-2017

Figure 2-2. Alteryx workflow for Melbourne Cup data in year 2018-2019

Figure 2-3. Alteryx workflow for Melbourne Cup data in year 2020-2021

The key tools in figure 2-1 to figure 2-3 are the Download and Regex tools. Download tool is used to download the page source from the URL, while Regex tool is applied to identify table location on the website. Resulting tables of Download and Regex tools can be found in figure 2-4 and 2-5, respectively.

Figure 2-4. Download tool and its resulting table after processing URL

Figure 2-5. Regex tool and its resulting table after processing downloaded data (DownloadData)

After the Download and Regex tools have been applied, the remaining parts of the workflows are able to be built using tools from ‘Preparation’ tab in Alteryx. At the end, all outputs from figures 2-1, 2-2 and 2-3 were combined using Union tool to obtain a single table from 2012 to 2021 data. The output from using Union tool can be found in figure 2-6.

Figure 2-6. Output file from combined Melbourne Cup data from 2012 to 2021

Dashboard creation

At the end of this challenge, the dashboard was created using Tableau. In total, 5 big numbers and 6 charts were created to visualise Melbourne Cup data from 2012 to 2021. Generally, the dashboard functionality is for viewers who want to check countries, jockeys, trainers, and horses that have high placements based on ‘Finish’ or ‘Margin’ fields or those that managed to win Melbourne Cup.

The created dashboard objective is to show yearly trend of average placement (based on the field ‘Finish’) based on countries where the horse come from, as well as jockey, trainers and horses that place high in terms of placement and margin. The last part of the dashboard evaluates the consistency between horses that place highest in terms of average ‘Finish’ or ‘Margin’ and horses that managed to win the race. A view of the produced dashboard can be seen in figure 3-1.

Figure 3-1. Dashboard for Melbourne Cup from 2012-2021

To interact with the dashboard functionality, please explore the following link:

Dashboard Week Day 1 – Melbourne Cup viz | Tableau Public

Kristiadi Uisan
Author: Kristiadi Uisan