5 min read

 

As Data Schoolers, towards the end of our 16-week intensive training, we all take on an ambitious challenge — creating one dashboard, one blog and one presentation on a fresh dataset every day for a week! Today marks the third day of this challenge. In this blog, I would like to share my approach to today’s challenge and showcase my finished dashboard.

You can also find all my dashboard week blogs here:

Day 1: Global Power Plants

Day 2: Wages vs. Inflation

Day 3: Star Wars

Day 4: 2021 Australian Census

Day 5: AFL Data

 

 

The Data

Today’s dataset is Star Wars data from SWAPI (The Star Wars API). The data and its documentation can be found here. It is a very interesting dataset, and I would highly recommend it to any Star Wars fans out there!

 

 

The Plan

Step 1: Data Understanding

It always pays off to start the analytical process by understanding our data. At this stage, it is often not necessary to develop a comprehensive knowledge of the data, but just enough so that you can start exploring relevant business questions and hypotheses.

For Star Wars data, I have noticed that:

  1. There are six main categories of data, namely People, Planets, Films, Species, Vehicles, and Starships. Given that we only have less than a day to work on this project, it is very important to prioritize and find a focus. Since I’ve always been fascinated by Starships, I’ve decided to work out with the Starships data.
  2. The Starships data contain fields such as the Name, Crew, Lengths, and various other characteristics of 36 star ships in the Star Wars franchise. However, it seems to lack information on the fire power of each star ship. What would Star Wars be without fire power?! Therefore, I decided to enrich the API data with extra information (such as star ship power ranking and detailed descriptions) found on the internet through web scraping. 

 

Step 2: Business Understanding

In this part, we try to understand why we are performing our analytics project. This is also where we begin to pose hypotheses or goals. This project is mostly for fun, so my goal this time is to act as Lord Vader’s trusted advisor, and help him to decide on the best star ship to buy at the Imperial Military’s Annual Star Ship Purchasing Meeting! The decision will be based on a multitude of factors, include star ship’s fire power, speed, and price etc.

 

Step 3: Planning

The majority of the challenge will come from data collection and data cleaning. Working with APIs and performing web scraping can often be unpredictable and sometimes very difficult due to the nature of working with unfamiliar API or website structures. Therefore, I have decided to set aside plenty of time (doubling from the usual) for performing data collection and cleaning.

 

Step 4: Data Cleaning and Pre-processing

I’ve used Alteryx to clean the dataset. More specifically, I:

  1. Acquired the raw Star Wars Starships data from SWAPI using a combination of the download and JSON parse tool.
  2. Cleaned and prepared the raw data by transforming it from JSON to appropriate tabular format.
  3. Enriched the data through web scraping.

 

 

The Dashboard

Below is a screenshot of my finished dashboard. The dashboard is made up of three main sections:

  1. Meets vs. Fails Selection Criteria: Lord Vader will select his preferred criteria for the star ships. And characteristics that fail to meet the criteria will turn red, and those that meet the criteria will be displayed as imperial black.
  2. Product Photo: Well, I think Lord Vader should at least see the photo of the star ship before he decides on which star ship to buy.
  3. Product Catalogue: Finally, we should consider our budget and the cost of each star ship, after all we do have an empire to run!

Of course, Tableau dashboards are meant to be interactive and should allow the user to explore their own questions and answers, so please follow this link to go to my Tableau Public and have fun with my dashboard there!

Martin Ding
Author: Martin Ding

Martin earned his Honours degree in Economics at the University of Melbourne in 2011. He has more than 7 years of experience in product development, both as an entrepreneur and as a project manager in robotics at an AI unicorn. Martin is expecting to receive his Master’s degree in Data Science from CU Boulder at the end of 2022. Martin is excited about data and it’s power to transform organizations. He witnessed at first hand of how instrumental data driven decision making (DDDM) was in leading to more team buy-in and insightful decisions. Martin joined the Data School to systematically enhance his knowledge of the tools, methodologies and know-how of Data Analytics and DDDM. When not working, Martin enjoys readings, cooking, traveling and golf. He also thoroughly interested in the practice of mindfulness and meditation.