Dashboard week is more than halfway over! Today’s challenge was building a dashboard from web-scraped plane crash data. While the format of the website was simple, some unforeseen challenges made the web-scraping process much more difficult than I thought it would be. Before I begin, you can find the dashboard here.

Web-Scraping Difficulties

The plane crash data was web-scraped from here. However, a room of eight data-schoolers making thousands of requests each to a relatively small website is not a good idea. The site quickly crashed, leaving us to decide how best to proceed with our dashboards.

Yet not all was lost. The website was made of yearly summary pages that showed summary data for each plane crash. To find more detailed information, we would have to request the page for each crash individually. The difference between the number of required calls in this case was huge – just using the summary data required about 400 requests, whereas the detailed data required 5000.

With some throttling it was possible to make these requests for detailed data, however it would take a few hours to get all the data. And we only had one day to build the dashboard!

So I ended up taking a combined approach. I focussed on plane crashes during World War 2 – I had already identified this as a potential story by exploring the summary data. Then I downloaded detailed data from crashes that occurred between 1939 and 1945 and built my dashboard with this information.

The Dashboard

The story of plane crashes in World War 2 (based on this plane crash data) is perplexing. Plane crashes, particularly those involving military aircraft, spike dramatically in 1945.

This should be expected, right?

Well yes, except most of these crashes occurred in October and November 1945 – World War 2 ended in September!

The story becomes even more muddled when we look at the proportion of aircraft that were shot down. Firstly, it is a small proportion of crashes – a similar number of aircraft crashed into mountains. Secondly, more civilian aircraft were shot down than military aircraft!

Perhaps this is a problem with the data or data collection – I suspect a large number of aircraft that were shot down were not reported to the public (perhaps to keep up morale). Similarly, German and Japanese military aircraft are suspiciously missing from the data, while US aircraft comprise the vast majority of crashes. It is likely that this database has only been able to find data from Allied sources at the time.

But regardless of its shortcomings, this dashboard paints an interesting picture of plane crashes in World War 2 – crashes dramatically increased throughout the war. In my opinion, this is due to an increased number of military flights combined with the rapid deployment of new, possible under-tested technologies.

Have an explore of the dashboard and see if you can uncover any stories that I have missed!

The Data School
Author: The Data School