Dashboard week continues… 

Dashboard Week, The Data Schools’ challenge to consultants nearing the end of their intense fourth month training before starting placement. The challenge for each day this week – create a dashboard and tell a story using a newly provided dataset each day.

Day 3: History of Plane Crashes

Today’s challenge featured webscraping data from the planecrashinfo.com website. Data on the website broke airplane crashes by year, linked to a page of incidences for that year. For example:

Then for each incident another page contained further information. Looking at the image above, each incident can be broken down further. For example:

Approach/Story

My approach for this dataset was to highlight the survival rate of passengers and crew over the datasets time period (1908 – 2021). To complement my approach, I wanted to allow the user to make their own analyse. Breaking their analysis down by year, Operator, Aircraft Type and crash locations across the globe.

Data preparation

The data on the planecrashinfo.com website was on multiple webpages and required a few steps to retrieve the entire dataset. Due to time restraints I decided to retrieve the data for each year in the first example above. While the summary information for each incident would have been great there were a few issues accessing the data with my cohort all hitting the website at the same time.

Starting with the original URL, I webscraped the URL extension for each year and fed that back into Alteryx to retrieve the incidents for each year. The output from this download was a row for information for each incident.

Downloading all incidences, I then needed to remove the HTML and parse out the data using mainly RegEx. This took a number of tools and because I was trying to rush my workflow wasn’t very efficient of clean. Then I was able to create a clean table format output with plenty of time to focus on my dashboard.

With additional time, I would have also gone back to clear up the locations/countries included in the dataset as there are some mismatches and nulls.

Dashboard creation

The challenging part about creating my story dashboard was thinking about the best way to allow the user to investigate the incidents that occurred each year.

I was particularly happy with the story I found within the data for this dashboard I believe i was able to communicate my story for users.

Chances of surviving an airplane crash are minimal (link to dashboard)

Functionality within the dashboard allows the user to see the trend in airplane crash data and to investigate each year by operator, aircraft type and location of the crash. I have also included help buttons to help with navigation and interactivity.

 

 

Scott Johnston
Author: Scott Johnston