The third day of the dashboard week has arrived. The purpose of today’s task is to utilize Alteryx to scrape the website planecrashinfo.co, which stores all aircraft disaster cases, and then use the scraped data to create a dashboard. To parse the HTML format data, it is evident that I will mainly rely on the Alteryx “Download” and “Regex” tools.

Web scraping
To begin, visit the web page in your browser and use the web development tool to inspect the HTML source code. This is available in most browsers by pressing F12. Scroll through the source code using the blue highlighting on the left as a guide until you reach the code that supports the table in question. In this web scrape work, I must first retrieve all of the year tables from the URL “http://www.planecrashinfo.com/database.htm” and then extract the detail link on this page. As shown in the figure below, I need to create a URL like “http://www.planecrashinfo.com/2021/2021.htm” to access the detailed information.

As long as I create a unique URL for each year, I can use Alteryx’s “download tool” to download the HTML code and use “Regex Tool” to parse the table data.

As previously said, the Alteryx workflow will be broken into two steps to obtain the information, the first of which is to extract the detail URL for each year (as shown below). The next step is to create the URL and download the accident data for each year.

The Dashboard

In the challenge, I’d like to create a fun chart to visualize the data. Because I am concentrating on the visualization, I may not have enough time to make a story or find the dataset in this limited period. Okay, in this case, I’ve chosen a radar bar chart to illustrate the number of fatalities from 1920 to 2021; the taller the bar, the greater the number of deaths. It looks great! Inside the radar bar chart, I also put a map and a bar chart. To switch between these two types of charts, I utilize a parameter to make it;

Key finding:

(1)The 1960s and 1970s were the periods with the most significant number of aircraft accidents, with an average of about 65 accidents per year, which resulted in a large number of deaths of passengers;

(2) Interestingly, after 2010, the average plane crash has dropped to 25 cases per year, and the number of passenger fatalities has also been significantly reduced.

(3)Russia, the United States, Brazil, Colombia, and France with the highest number of accidents sine 1920 until 2021

Gary Li
Author: Gary Li