I’m a murder mystery nut. Unsurprisingly, Agatha Christie (the undisputed ‘Queen of Crime’) tops my list of favourite authors. Fresh from a viewing of The Mousetrap at the Comedy Theatre in Melbourne, I wondered if I could use the world of Agatha Christie to understand the different types of joins in Alteryx (which I’m still getting my head around.)

First, let’s establish the facts:

A join combines two or more tables or data sources into a single table based on a common field or set of fields, allowing us to merge data and perform an analysis on the combined dataset.

We often encounter four types of joins in Alteryx: inner join, left join, right join, and outer join.

For this post, we’ll be using characters from Agatha Christie’s novels as examples. Imagine we’re trying to solve a murder and we have different clues from two characters – Hercule Poirot and Miss Marple, of course. To solve the case, we need to join the information we have from them into one table.

An inner join only returns rows where there is a match in both tables. So, if we only want the information where both Poirot and Marple agree, we’ll use an inner join. This join will exclude any information where there is no agreement between the two, keeping just the information where they agree.

Next, we have left and right joins. A left join will return all the rows from the left table (Poirot) and matching rows from the right table (Marple). If we use a left join, we’ll get all the information from Poirot, and the matching information from Marple. Likewise, a right join will return all the rows from the right table (Marple) and matching rows from the left table (Poirot).

An outer join returns everything. In our example, if we use an outer join, we’ll get all the information from both Poirot and Marple, regardless of whether they agree or not.

Alteryx is on the case!

Say we wanted to use joins in Alteryx to solve a murder. We’ve interviewed our suspects, now it’s time to put the pieces of the puzzle together.

It might look something like this:

We have three datasets. One contains information about the murder weapon and who had access to it on the night. Another contains information about our suspects – such as their names, jobs and last known addresses. A third contains the alibis of our suspects.

First, we’d need to use an inner join to find suspects who have access to the murder weapon on the night. We’d join our suspects and weapon dataset using an inner join on the suspects name to find only those who did have access to it on the night – so only those in both our databases.

Next, we’ll join our newly shorted list of suspects with our alibis dataset. We’ll use a left outer join and use the suspect’s name as the common field. We’ll use a left outer join because we only want to keep the records that don’t match in both datasets – in this case, suspects who don’t have an alibi.

Now, we should have only one name – and using Alteryx joins, we’ve discovered it’s someone who didn’t have an alibi but did have access to the weapon on the night. Dun dun DUUNNNN!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The Data School
Author: The Data School