Have you heard about UFC? I knew nothing about it before we had a challenging dataset about this topic today. By looking through the website’s introduction and the data dictionary on Kaggle, I finally have something in mind and decided to build a dashboard about the important features that affected the UFC matches’ winning rate.
Data preparation with Alteryx
The data set is originally prepared by Kaggle for the predictive model purpose. So, it contains 137 fields, and some of them are not important for prediction or analyzing. I don’t want to go deeper for prediction for the time restriction. However, I still need to clean and select the relevant features for analytical purposes.
Here is my workflow in Alteryx:
I used the Pearson correlation tool to figure out what are the important features and help me to select the relevant features at the same time. The Pearson Correlation tool uses the Pearson product-moment correlation coefficient (sometimes referred to as the PMCC, and typically denoted by r) to measure the correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive. It is widely used in the sciences to measure the strength of linear dependence between two variables. (From Alteryx help)
As I was planning to apply the wining rate to different features and the UFC master dataset’s granularity was each match per row, I had to separate it into each fighter in each game per row. So the other important outcome of my workflow was to separate one match into two rows for each fighter. My main processes in the workflow were:
- Cleaned data and create separate value for important features that I got from afterward Pearson Correlation.
- Separate Red and Blue fighters into different rows.
- Use the Pearson correlation to find the most relevant features.
- Use the Pearson result to filter the NUmerical attributes.
- To get the Opponent fighter for each fighter.
- To parse location into city, state, country, and clean for final output.
I want to mention more details about how to use the Pearson correlation to filter the irrelevant attributes. After I did the Pearson correlation to the numeric fields, I could have 31 fields that only have nulls in correlation. That means these 31 fields are not relevant to the match result, and I can use it as a filter to eliminate the irrelevant fields.
After that, I convert the ‘Field name’ into columns using the ‘Cross tab’ tool and union the result with the cleaned original table and filter the uncommon field by selecting the ‘Output common subset of the field.’ I also joined back with the demographic data from the original table.
My UFC dashboard: How to Win in The UFC?
Now is time to tell the story from my dashboard:
The top chat shows that 5 out of 10 of the top ten features affect the winning rate in a UFC: Odds, Age, win by decision, and the number that the fighter win by decision. The other 5 are base on the comparison with the opponent fighter: age, the current losing streak, the win streak, the reach out length, the total rounds, and the significant strength different from the opponent.
If you would like to know how these features affect the results, you can select different dimensions on the bottom left to see the winning rate in each feature bins.
Base on the two charts, you can clearly see that the smarter, more confident young man with long arms has a much better chance to win the game.
To play with fun, you also can use the right bottom chart to see if you are a fighter with certain features to see your chance win in a UFC?
To figure the challenge today, I used the Pearson correlation in Alteryx to find the important features and apply that into my viz. Please find my Viz on Tableau Public: How to win in the UFC?.