Day 3

For dashboard week day 3, we were tasked with web scraping some crime data from a webpage, and then designing a dashboard that explored some insights associated with that crime data. It was also highly recommended to bring in data from other sources, as the table we web scraped was quite bare, containing only four fields of information.


So first thing to do was web scrape the data from the webpage. This was done by first attaching the year number to the URL’s of that webpage, so that we could get the crime data for different years, then downloading html code from that URL via the download tool, and then using a chain of regex tools in order to get the information contained within the table rows. After some basic cleaning, I then did the same process for another table I found interesting on that webpage, this way I could combine the crime index data with other index data, and try to make some interesting insights from that (such as ‘does a correlation exist between the cost of living and the crime index’).

I then also thought to bring in population and GDP data from government resources. This way, I could provide a much more interesting story when it come to discussing factors associated with the crime index. For example, I could then calculate GDP per capita, and see how changes in GDP per capita affect the crime index.


The goal here was to find the factors that are most likely to contribute to the crime index. This was done by looking at the crime index for specific regions, and mapping that against other indexes, to see if there was a correlation. For example, one insight that I found was that GDP per capita may be a relatively strong predictor of movements in the crime index in Asia, Americas, and Oceania, but not so strong in Europe and Africa. Once some relatively strong correlations were found, I then looked at the countries in which the crime index has changed the most overtime. This was done so that we could analyze those countries, and see how the other indexes in that country have also changed overtime.

Overall I found this day to be the most challenging. Very time demanding and also very hard to draw any insights as:

  • Some of the sample sizes were to small to find any patterns
  • What contributes to the crime index could be so much more complex then a few indexes describing the overall state of a country based on survey data

You can check out the visualization through this link: