Can’t believe it’s already the second week of my Data School life. It feels like I was just preparing for my interview a few days ago!

The first week in the Data School was all about Alteryx (and meeting new people, of course). There are hundreds of tools in Alteryx, and by only learning a few most commonly used ones, I already found it an extremly powerful tool for data cleaning. A lot of things I had done in Excel manually before can actually be done in Alteryx in a much efficient way. Here I’d like to look for an example of cleaning multiple null values by Formula Tool and Multi-Row Formula Tool.

 

Input Data vs. Target

The data I’m using today has a hierarchal structure in Excel, something similar to this.

(Data source: Australian Bureau of Statistics)

When I input it in Alteryx, it will look like this:

But I want it to look like this, with no null values and all codes and group names in different columns.

 

How Does It Work?

To achieve my target, first I can use a Formula Tool to create multiple new columns, and followed by a Select Tool to only retain those needed ones.

The result should look like this now:

Then I need to fill out those null values. Here’s when I use the Multi-Row Formula Tool! With multi-row formula tool or multi-field formula tool, we can only use one expression (or formula) per tool. To clean this dataset, I use multi-row formula tool for five times, but all of them are based on the same logic: using Conditional Functions together with [Row-1: Field Name] and [Row+1: Field Name] variables to fill the null values or the values I’d like to change with the value in its previous row of the following row of the same field. For example, in the first field (or column), I’d like to fill all the null values with “1”, which is the value on the first row. The expression I use in the multi-row formula tool is:

After running the workflow, the result should look like this:

Now it looks closer to my target! The expressions I use in the following four multi-row formula tools are:

  1. if length([Major Groups])=2 then [Row-1:Major Groups] else [Major Groups] endif
  2. if isnull([Major Groups]) then [Row-1:Major Groups] else [Major Groups] endif
  3. if isnull([Minor Groups Code]) then [Row-1:Minor Groups Code] elseif length([Minor Groups Code])=2 then [Minor Groups Code] else [Row+1:Minor Groups Code] endif
  4. if isnull ([Minor Groups]) then [Row+1:Minor Groups] elseif length([Minor Groups])=4 then [Row-1:Minor Groups] else [Minor Groups] endif

After these steps, my result is now looking like this, which is pretty close to my target!

The final steps are using Filter Tool to filter out null rows in Countries field and Data Cleansing Tool to remove unwanted whitespace. Then I manage to alter the initial data to what I’d like it to be, which is much easier for join with other data for further analysis.

This is a simple example of using a few Alteryx tools to clean my data. Can’t wait to learn more in the not-too-distant future!

 

Ming-Hsuan Lee
Author: Ming-Hsuan Lee