For day 2 of Dashboard week we were tasked with a Sportsviz Sunday (from here)
It was a very rich dataset, with massive scope for investigation. I decided to focus on Pitches, with the statistics for every pitch over a 10-year period recorded. With almost 1 million rows in each year, it’s incredible how much information is captured in American Baseball. I spent a lot of time deciphering what was in every column and the level that they were at, for example ‘balls’ and ‘strikes’ were what the count was at before the ball was pitched. I then explored several different charts and angles that I could approach the challenge. I wanted to include mascots, but with the time restriction I settled on focusing on calls that I believe to be blatant exceptions to what should be being called.
With the zone information of where the ball went over home plate, and the umpires call on that ball, I could calculate when they had erred in their decision. Obviously, this is a split-second decision and there are many factors that go into it, but I wanted to see if there were any patterns there.
Some of the main area’s that I had data for and started looking into were:
- Right or Left Handers (pitching and batting)
- Game Duration
- Weather, Temperature, Wind direction and speed
- Type of Pitch / Spin Rate and Direction
- Time of day and day of week
- Pitch Speed
- The Stadium / Team / Umpires involved
- Top or Bottom of the innings
- Year on Year Trends
I still find it incredible how much information there is just in the fields that I’ve listed above. With the time restrictions I had to lean towards the areas that were showing me an obvious difference/trend, which was the Teams, Stadiums and Umpires. I thought if I had time I’d add in the Right or Left Handers and from there continue adding areas (which maybe I’ll get back to in the future).
Anyway, the charts were coming along, and as usual formatting was taking me way longer than expected
I learnt some neat tricks like centering a quadrant scatter plot like this.
I battled with being clear in the wordings and explanations for an audience that may not fully understand baseball, however it was quite a clunky concept to explain as I had both:
- Pitches that are in the Strike zone but are called a Ball and
- Pitches that are Outside the Strike zone but are called a Strike.
The final thing I learnt was to have the variable in the final form as early as possible. When I wanted to change from Totals to Averages per game late in the day, it broke a number of my other calculations and it was a big task troubleshooting some issues with it (the solution was Order of Operations! – Adding a filter to Context and using Include instead of Fixed, to ensure my calcs were performing as I wanted).
In the end I got there and look forward to trying more Sports Viz Challenges (https://www.sportsvizsunday.com/) in the future and even getting back to this one to see what comes out with weather and all the other factors into the mix.
Here’s the final Viz: