The Data

The dataset for the first challenge of Dashboard Week comes from the the World Resources Institute’s (WRI) Global Power Plants Database.

For this challenge, I will be exploring how the various types of power plants around the world contribute to the overall energy production. In particular, I want to look at the number of power plants operating on either renewable or non-renewable energy sources, as well as the proportion of energy production each category represents. I will also be looking at the top performing power plants of various countries, to get a sense of the type of energy production they are most reliant on.

Data Preparation

The dataset is assembled from open access sources, either from an official entity, or sources deemed to be reliable by the WRI. While the data preparation was fairly straightforward, there are a few notable issues I had to address in the preparation stage.

 

 

As stated in the report accompanying the dataset, officially reported energy production only constitutes 24% of the data collected for this dataset. To fill in some of the gaps, the WRI utilised a machine learning model to generate production estimates. As the distinction between reported and estimated data was not of interest to me, all I had to do was to combine the two sets of data.

 

The remaining gaps in the data turned out to be quite problematic when attempting to calculated the average energy generated by each plant. When applying the Average function in Alteryx, null values are treated as 0 in the calculation. Instead of calculating the average power generation across only the years with recorded data, years without data are also included in the denominator, resulting in an underestimation of the energy generated. To work around this issue, I had to calculate the sum of values and the number of years with valid data (non-nulls) separately, before using calculating the average with a formula tool.

The dataset detailed the type of fuel used in each power plant (e.g. Coal, Oil, Solar etc.), but did not classify them into renewable and non-renewable sources. In the process of manually organising the categories, I found out there there were a number of power plants that uses both renewable and non-renewable resources in their power generation. I also made the decision to create a separate category for nuclear power plants, as while they are technically considered non-renewable resources, their costs and benefits are generally evaluated differently from the other conventional non-renewable energy sources.

The Dashboard

At a glance, it is clear that the world is still overwhelmingly reliant on non-renewable energy sources, as while they only make up a quarter of all the power plants in the world, they generate almost two-thirds of all the power. While 4 out of the 5 power plants with the largest outputs are using a renewable energy source, hydroelectric dams also come with their own suite of ecological problems.

The reliance on non-renewable energy is even greater in Australia, where it accounts for over 82% of the power generated in the country.

Conclusion

While my intuitions about power generation coming into the challenge were generally congruent with my findings, visualising the data brought me a lot of clarity on how different countries generate their power, and how much we are actually relying on non-renewable energy sources.

The Data School
Author: The Data School