After a challenging data preparation day yesterday I was ready for a clean data set so we could spend the day focusing on our dashboards.  I thought wrong, today started with web scraping.  We were given the link to access crime index data for cities around the world.  Our task today was to scrape the table from this webpage and supplement it by scraping more data from a different table.  I chose to supplement my data with the cost of living table and the traffic/emissions table.  My focus was on trying to find any correlation between cost of living and crime numbers and I believe I was able to find a section of the world that has a very bad combination of high cost of living and high crime.



My workflow was simple, it required a create rows tool to make a row for every year I wanted to download.  I followed that with the download tool and Regex’d out the data.  Once that was complete I joined the tables together and then used a union on any rows that may have been missing data as I didn’t want to miss any of the important information from the crime index table.


The Dashboard

I started by identifying the Sub-Regions that had a high Cost of Living Index and a high Crime Index as I labeled a combination of those two variables as ‘bad’.  These Sub-Regions, coloured in red where Latin-America & The Caribbean, and Melanesia.  The chart on the right then identifies the Intermediate-Regions within those Sub-Regions and compares their Cost of Living to the Crime Index.  Any regions with a higher crime index than cost of living index were investigated further as I thought that even the high costs were not worth it as the crime was even worse.


After Identifying the Intermediate-Regions I wanted to drill down even further and look at the Countries and Cities that had higher crime index than cost of living index.  This is plotted on the chart on the left and the cities in those countries are plotted on the map on the right.  It is easy to see that the cities with a high cost of living and high crime index are mostly in South America, Brazil mostly.


Now that I identified the cities that had a I cost of living and crime index, I wanted to look into how they performed according to different metrics such as quality of life and traffic.  Some cities such as Kingstown didn’t seem to deserve to be in this ‘bad’ group of cities as the only really bad variables were grocery costs and rent however was around the middle or even one of the better countries for all other variables.  Of the worst performing countries it seems that Sao Paulo and Caracas were consistently near the top for all metrics.


I used a very similar layout to complete my final section.  The last section of my dashboard summarized everything above and listed the 5 worst cities to live in the world according to the analysis of cost and crime.  Brazil makes up 3 or the worst 5, with Sao Paulo being the worst city for the last 4 years running.  The city with the worst overall average score was Caracas however there was a few years of missing data for them so they may have even been number 1 in the years that we don’t have their data.



This data provided some valuable insights into the world and working on my web scraping to get it was a welcomed task.  While looking at the worst cities in the world may not be the happiest topic, it was all done with a grain of salt as these metrics alone could never properly identify such a thing.  I’m looking forward to seeing what tomorrow holds.  Please feel free to check out the full dashboard at the link below:



Mikael Nuutinen
Author: Mikael Nuutinen