Data comes in many shapes, sizes, and forms, and one of the most common occurrences of its’ presentation is when it’s provided for analysis in a multitude of different tables. This is once it has already been cleaned and transformed in a way that makes sense and is ready to be understood. In Alteryx, some of the most common types of data transformation to combine two (or more) different tables are Unions, Appends, and Joins. We’ll go through the intricacies and differences between the three.

Appends:

 

Appending fields in Alteryx is when two tables are added together, even if they don’t have similar fields. Every row of the source input table is added to the target input table, resulting in what is known as a ‘Cartesian Join’. As this means that each row of one table is matched to every other row of another table, you need to be careful which is your source and target tables as the tables get larger. The two datasets above are not too large, so below are examples where both are the target and source, with the left having the consumer segment table as the source and the state population table as the target, and vice versa on the right.

NOTE: The default option for the tool is to error out on appends that are more than 16 records. You can configure this to allow you to append larger tables without an error, or change to an option that provides you with a warning when it exceeds 16 records, but still allows you to carry on with the append.

Unions:

The Union tool in Alteryx is very useful for when you want to add two tables with similar fields into one big table, thereby basically placing them on top of each other so their rows stack together. The main difference between such Unions and Joins is that a Union tool can add 2 or more of these fields together, while a Join places rows from 2 (sometimes more but this will be discussed later) next to each other in the same row. The input tables for Unions can be accepted by either field name or record position in the table, with the second table, the workflow, and the option to choose by field name or position pictured below.

 

 

If the second data table has an extra field that’s not in the other table, you can choose to either omit and remove the column, or show a null for the rows where there is no data. Below are pictures of the configuration options, and examples of the second table, workflow, and both output options.

Joins:

There are a number of different types of Joins, especially in Structured Query Language, or SQL. Alteryx doesn’t allow for the full range of different joins to be utilized, but it’s still useful knowing exactly what the different types are and how they affect the output of your table. Below is a picture of the Alteryx configuration for the Join tool, allowing you to choose to join by the record position of the data (matches the rows based on their positions within the two tables), or the more typical option of joining by specific fields, which requires the two data inputs to have a common field or column.

The first type of join is an Inner Join, whereby all records that have matching values in both tables are returned. This may be the most common type of join, as a typical thing to be looking for amongst two tables you are joining are the values that are the same amongst the data rows.

Another type of join is an Outer Join, also known as a ‘Full Outer Join’ or just a ‘Full Join’. This is where all records from the data in both tables are returned, as long as there is a matching field or column in either the left or the right table.

Finally, there are also the Left Join and Right Join types, or ‘Left Outer Join’ and ‘Right Outer Join’. These return all records from either the left or the right table depending on the name, as well as only the matched records from the opposing table. One thing to keep in mind is that both Inner Joins and Left/Right Joins create a row each time the join criteria is satisfied, and can thus result in the creation of duplicate rows.

In the configuration section pictured above, we can see there is no Full Join option only the other three. The Join tool allows you to join by either a single field or multiple fields, and it allows you to automatically pick which Join output you want to continue your workflow with, without needing to specify like if you were writing SQL code. In the example below, we can see the two input data tables and the results when outputting the Inner Join option.

 

 

 

 

 

 

 

 

 

 

 

 

 

You can also join multiple different tables at the same time, as long as there is a matching field. Any rows that don’t contain data for a column in one of the other tables will return a null value. Below is an example of a Join tool being used to combine 3 different data inputs.

 

 

 

 

Nicholas Seah
Author: Nicholas Seah