Are you trying to find your ideal dataset?
Finding a nice dataset is a bit like finding an in vogue restaurant for a date. Here I’ll discuss some of the familiar places myself and the other data schoolers use, as well as my thoughts on the newly available Google Database Search.
Google Dataset Search
I felt the need to put this first, not because it’s the best, but because it’s new (it moved out of beta on the 23rd this month), and, because it’s Google, it’s likely to get a lot better.
It boasts close to 25 million different publicly available datasets and has a clean interface. You can filter by usage rights, cost, date updated and the data format. But quantity doesn’t reflect quality.
It does link to a lot of different sources and makes quickly finding (some) data easy. Unfortunately, to actually get the data, you need to navigate the website. Due to the uniqueness of each source, it can be challenging to navigate and not necessarily be of high quality.
When looking for a dataset, you’ve usually got a topic in mind. If you’re interested in Australian Crime Statistics, a Google Dataset Search will link to data.gov.au – which is a good data source (see below for a review). However, if interested in the number of cattle in Colombia (the country), you’ll be a bit more out of luck. That’s not because it isn’t available (see here) but rather because languages aren’t currently supported well. My point being that I could find the latest data source with relative ease using regular Google with the help of Chrome’s built-in translation tool and not with Google Dataset Search (even in Spanish).
TL;DR – would I recommend it? For a quick initial search of a topic, sure, why not. For finding strange and obscure data sources? Use Google.
Which brings me to the obvious choice. We all know how to Google, but one trick I’ve picked up is sometimes searching for dataset providers rather than a dataset itself yields better results (for instance, searching for “Colombia data” rather than “cattle in Colombia dataset”).
Some Other Recommended Places
Data.gov.au
The official data source for data relating to Australia is excellent for finding data relating to, well, Australia. But you can use Google to find similar websites for other countries (New Zealand, UK, US). The data is reliable and of good quality.
World Bank
A fan favourite at the data school, this has a large number of high-quality data sources, many of which can be blended to add an extra dimension to the data.
Kaggle
Kaggle isn’t just great for learning and competing; it’s also full of datasets which can be filtered by licenses.
Making your own
Despite being in a world full of data, sometimes we just can’t find what we’re looking for. Web scraping is a great technique, as is using APIs. Copying and pasting into Excel is less efficient. Lastly, if you need to test the feasibility of a Tableau Viz, quickly making up some data can be a good option.
As always, thanks for taking the time to read my blog! Favourite data source missing? If you have any comments, suggestions or want to chat, free to connect with me on my LinkedIn!
~ Ryan Edwards