Data analysts are likely to encounter multiple tables that contain generally different information but provide more value when put together. There are a few ways to do this in Alteryx, so it is important to consider which way is most appropriate for the situation.

 

Join

The join tool is used when there are two tables that both share a common column. For example, there may be a table of customer details and another with transaction history, while both of which share a common Customer ID column. We may want to combine the rows together to provide more complex insights, like reporting on the customer demographics for various product categories.

Join takes two tables (left and right) as input and a specific common field in each of the tables to establish a link between them. Join will look at each row in the left table, identify its specific field’s value, and then find the row in the right table with the same value in its own specific field. Finally, join will add the add the entire left row to the entire right row. The output is a join table with the number of rows equal to the number of matches that exist between the left and right input tables and the number of columns equal to the sum of the left and right input tables’ number of columns. Additionally, the remaining rows without a match from the left input table are produced in the left output table, and that of the right input table in the right output table, so no data is lost.

Because the specific field in each table contains the same information, it is good practice to remove one of the columns, removing unnecessary duplication.

 

Union

The union tool is used when there are two tables that share most of their columns. For example, there may be two tables of sales data from two different regions that contain mainly the same columns. Combining the tables would be helpful as it would enable analysis of and reporting on all regions at once.

Union takes multiple tables as input and stacks them one on top of another. By default, union matches columns by name. The output is a table with the number of rows equal to the sum of all the input tables’ rows and the number of columns equal to the sum of all the input tables’ distinct columns. Thus, if each of the input tables share the same columns then the output table will just have these same columns.

Alternatively, union can match columns by position. This means that the output table would instead have the number of columns equal to the maximum of all the input tables’ columns. It is important to be careful matching columns by position as if the columns do not perfectly align it will lead to inaccuracies. To avoid this, union can also match columns by manual configuration.

 

Append

The append tool is used when there are two tables that contain different rows and columns. For example, there may be a table with a store’s sales for the day and another with a single row of the store’s ID, address, region and the date. Adding this data onto the end of the first table will set the tables up nicely to union them.

Append takes two tables (target and source) as input and adds each row of the source table to an instance of each row of the target table. The output is a table with the number of rows equal to the product of the target and source tables’ rows and the number of columns equal to the sum of the target and source tables’ columns. Thus, if the source table is a single row, the output table will have the same number of rows as the target table, and with the source row’s data added onto the end of each target table’s row.

 

Understanding which tool is appropriate for each situation is crucial in designing an efficient workflow that combines data as needed. Happy developing!

Cover image by Alexa from Pixabay

Hunter Iceton
Author: Hunter Iceton

Hunter Iceton is an enthusiastic and positive individual. He graduated from Sydney Uni in 2017 with a Bachelor of Commerce (Liberal Studies) majoring in Finance, Marketing and Quantitative Business Analytics. For the next few years, Hunter spent his time creating and releasing music, while tutoring primary and high school students in Mathematics and Business Studies. Hunter is now excited to be joining The Data School, looking forward to approaching analytics with a creative perspective. In his spare time, Hunter enjoys continuing to create music, reading philosophy and cooking plant-based dishes. Otherwise, he can usually be found at a restaurant, a bar or an art gallery.