Fake Data?

Fake Data. It sounds like a bad thing, right? Well, not always. Of course, there are some unscrupulous people out there who use false information for malintent, but there are also some very legitimate times for the use of fake data. Creating a mock data set can be very useful. In fact, it might just save a project that would otherwise miss a critical deadline.

For example, it’s Monday afternoon and a business-critical dashboard is due on Friday. You’ve just found out that the data you need won’t be available until Thursday morning. By then it will be too late to start from scratch and be ready in time to meet the deadline. Luckily, IT can provide you with the structure of the data tables. Creating a mock dataset now will allow you to create your dashboard and fully test it, meaning you can swap in the real data on Thursday and gather all your insights with time to spare!

Other use cases involve proof-of-concept pieces, app testing, and just having some fun! I recently generated some mock data and used it to create a viz looking at alien invasion of Australia, because why not? A little more on that project later…

 

Creating Mock Data

So, how does one go about creating mock data? A great place to start is https://www.mockaroo.com. This is a fantastic tool that allows you to create data in CSV, JSON, SQL, and Excel formats. A free account will allow you to generate up to 1,000 rows data, and there is also a paid account option for those that need the additional features.

When you create a new data set, you are presented with the ability to add a field name, a field type, and there’s a set of options which are specific to that data type.

There 167 data types available, covering a huge array of subject areas. This ranges from simple things, such as names, airports, and currencies, through to more complex items such as hashes and a variety of distribution types.

 

Figure 1: Dataset creation

 

So, if you’re in the need for some fake company names (Rhombus Industries, anyone?), then Mockaroo has you covered. And while you’re at it, you can create a list of Product IDs, sales, and much, much more.

At the end of each set of options, there is a ∑ symbol. This brings up the formula window. Formulas allow you to use Ruby code to manipulate your data based on custom logic. With Ruby and Regex, the possibilities are virtually endless and should satisfy even the most fastidious amongst us.

‘Scenarios’ are also available. These allow the user to define different value distributions based on another field. As an example, you may want “Technology” orders to have a higher average order value relative to those in the “Office Supplies” category. The user can achieve this by creating a scenario and defining the distribution parameters. This scenario can then be used in dataset creation.

 

Figure 2: Creating a scenario

 

This is all a bit alien to me

Creating fake data is new to me, but Mockaroo helped to streamline the process. Below is a screenshot of my first fictitious dataset, a list of 30 alien species.

Most of the data types I chose should be self-explanatory. For average alien heights and weights I decided that these should be a normal distribution based on above average human dimensions. This gives us some very large aliens, along with some little guys.

For alien planet name I chose to use the data type “Animal scientific name” and then took just the first word. This may not fool a zoologist, but it saved me coming up with a list of names that fitted my sci-fi theme! Animal common name was also available, but I didn’t think ‘Pelican’ would really cut it here…

Figure 3: Having fun with Mockeroo by creating an alien dataset

 

So, next time you are surveying the landscape of opportunity and questioning what to do for your next data project, maybe consider the possibilities of using mock data. Mockaroo and fake data may even help ignite a new passion for experimentation!

 

Jonathan Carter
Author: Jonathan Carter