• What’s One Hot Encoding and why should we do One-Hot Encoding

One-Hot Encoding can be defined as the essential process of converting the categorical data variables to be provided to machine and deep learning algorithms which in turn improve predictions as well as classification accuracy of a model.

The image below shows what we want to achieve by implementing One-Hot Encoding.

  • How to implement OneHot Encoding in Alteryx

There are 2 ways to do the One Hot Encoding in Alteryx, one is using the basic tool and the other one is using the intelligence suite.

In this blog, I will walk you through how to do the One Hot Encoding using the basic tool. Below is our input data.

Step 1: Duplicate (use formula tool) or self join (use join tool) the category column;

Step 2: Use cross tab on the category conlumn header;

Step 3: Use multi-row formula to replace the NULLs with 0s;

By doing this, we successfully transformed one categorical variable into a binary categorical variable. The complete workflow and the result are as below.

In short, One-Hot Encoding is a vital step in data preparation for predictive analytics and will allow you to use much more of your data as predictor variables, hopefully increasing accuracy along the way.

 

Zoe Lu
Author: Zoe Lu

Zoe is a graduate of the University of Queensland, majoring in professional accounting and applied finance, she always strives to learn and enjoys applying her skills to manage any challenge. She moved from Brisbane to Melbourne three years ago and she believes that all change is good change. When Zoe is not creating dashboards, you will see her at Pilates or enjoying the good restaurants around Melbourne with her family and friends. If you ever get a chance to eat hot pot with her, she will share the recipe for her delicious secret dipping sauce.