-
What’s One Hot Encoding and why should we do One-Hot Encoding
One-Hot Encoding can be defined as the essential process of converting the categorical data variables to be provided to machine and deep learning algorithms which in turn improve predictions as well as classification accuracy of a model.
The image below shows what we want to achieve by implementing One-Hot Encoding.
-
How to implement OneHot Encoding in Alteryx
There are 2 ways to do the One Hot Encoding in Alteryx, one is using the basic tool and the other one is using the intelligence suite.
In this blog, I will walk you through how to do the One Hot Encoding using the basic tool. Below is our input data.
Step 1: Duplicate (use formula tool) or self join (use join tool) the category column;
Step 2: Use cross tab on the category conlumn header;
Step 3: Use multi-row formula to replace the NULLs with 0s;
By doing this, we successfully transformed one categorical variable into a binary categorical variable. The complete workflow and the result are as below.
In short, One-Hot Encoding is a vital step in data preparation for predictive analytics and will allow you to use much more of your data as predictor variables, hopefully increasing accuracy along the way.