5 min read

 

As Data Schoolers, towards the end of our 16-week intensive training, we all take on an ambitious challenge — creating one dashboard, one blog and one presentation on a fresh dataset every day for a week! Today marks the first day of this challenge. In this blog, I would like to share my approach to today’s challenge and showcase my finished dashboard.

You can also find all my dashboard week blogs here:

Day 1: Global Power Plants

Day 2: Wages vs. Inflation

Day 3: Star Wars

Day 4: 2021 Australian Census

Day 5: AFL Data

 

 

The Data

Today’s dataset is the Global Power Plant Database from the World Resource Institute. The data and its documentation can be found here.

 

 

The Plan

Step 1: Data Understanding

It always pays off to start the analytical process by understanding our data. At this stage, it is often not necessary to develop a comprehensive knowledge of the data, but just enough so that you can start exploring relevant business questions and hypotheses. 

For our Global Power Plants data, I have noticed that:

  1. It contains data on global power generation at a power plant level (each row is one power plant).
  2. It contains yearly data, however the years are in columns (requires transposing, so that each row will represent one power plant per year of observation).
  3. Data is sparse and there are a lot of missing values (up to 99% missing for some columns).

 

Step 2: Business Understanding

In this part, we try to understand why we are performing our analytics project. This is also where we begin to pose hypotheses or goals. With the movement towards cleaner energy, my question or hypothesis was to explore whether countries are really moving towards cleaner alternatives (e.g. from coal to solar). Furthermore, are different countries evolving in different ways and if so, how?

 

Step 3: Planning

Based on data understanding, we know that missing data will be a major problem for our analysis. Ideally, we should try to find additional data, patterns in data or methods of imputation to try to fix the missing data problem. However due to time limitations, as well as the severity of missing data (50-70% values missing for the power generation or estimated power generation columns), there was no simple imputation or additional data that would reasonably fix the missing data issue without introducing excessive bias.

Therefore, I decided to drop the missing values altogether. When we make decisions like this, it is always important to understand its consequences and how it could affect our analysis. Because we are dropping a lot of missing values, our dashboard/analysis should focus less on absolute numbers and focus more on broader trends. We should also focus less on cross-sectional comparisons (especially at a detailed level, because we are losing some countries that have no data), but focus more on longitudinal aspects such as how a particular country (that has data) has evolved over time. These kinds of conscious planning, decisions and trade-off choices are essential in any data analytics project.

 

Step 4: Data Cleaning and Pre-processing

I’ve used Alteryx to clean and enrich the dataset. More specifically, I:

  1. Transposed the years (and their values).
  2. Cleaned up and filtered out the null values.
  3. Enriched the dataset with extra region and sub-region dimensions that relate to each country.

 

 

The Dashboard

Below is a screenshot of my finished dashboard. The dashboard seeks to answer three questions:

  1. Do different regions (such as continents, OECD vs. non-OECD and sub-regions) have different structures of electricity generation?
  2. How did the importance of the various sources of electricity generation change over time?
  3. How did a particular country’s (such as the US) electricity generation evolve over time at the power plant level?

Of course, Tableau dashboards are meant to be interactive and should allow the user to explore their own questions and answers, so please follow this link to go to my Tableau Public and have fun with my dashboard there!

 

 

The Insights

Some highlights from my dashboard include:

  1. OECD countries seem to have shifted away from coal, with coal’s share of power generation dropping from 33% to 23% in 6 years.
  2. OECD countries have also strengthened their investments in cleaner alternative power generation technologies such as gas and solar. In fact, solar has seen an 8-fold growth over the past 6 years (for example 0.25% to 1.8% as a share of power generation in the US), albeit solar came from a much smaller basis than coal, the growth has still been phenomenal.
  3. Non-OECD countries on the other hand, seem to have become more reliant on traditional power sources such as coal.

 

Martin Ding
Author: Martin Ding

Martin earned his Honours degree in Economics at the University of Melbourne in 2011. He has more than 7 years of experience in product development, both as an entrepreneur and as a project manager in robotics at an AI unicorn. Martin is expecting to receive his Master’s degree in Data Science from CU Boulder at the end of 2022. Martin is excited about data and it’s power to transform organizations. He witnessed at first hand of how instrumental data driven decision making (DDDM) was in leading to more team buy-in and insightful decisions. Martin joined the Data School to systematically enhance his knowledge of the tools, methodologies and know-how of Data Analytics and DDDM. When not working, Martin enjoys readings, cooking, traveling and golf. He also thoroughly interested in the practice of mindfulness and meditation.