Flight path data is typically presented as a table with some columns giving the details of the flight’s origin and other columns describing the flight’s destination. One record represents a single flight leg between two locations/airports, so it has an origin and destination.

We know that many people travel directly to their holiday destinations, but many others have a stopover on the way. For those indirect flights, in our data, we would be looking at two records rather than just one.

So how do we visualise flight paths in Tableau? Well, there’s the easy way of simply drawing lines between all origin and destination locations in each row:

Above, I’ve visualised all flight paths going from London, Brisbane, and Los Angeles. This doesn’t really filter on only flight paths that connect those cities though, and it doesn’t indicate whether flights are direct or indirect.

What if we want to visualise all flight paths between two cities – including the possibility of a single stopover? The easy solution would be to join the flight data table to itself and filter on those combinations where all origin/destination combinations satisfy our search.

The problem is joining flight data is expensive, results in duplication, and could turn into a nightmare if we are working with live data. So, let’s solve this problem using a much more elegant solution: Sets.

STEP 1 – Setting Origin and Destination

I’ll create two parameters for the user to select an origin and destination, then add these to my sheet. For simplicity, I will be using origin and destination cities rather than individual airports.

STEP 2 – Finding all connected flights

The first thing I want to do is determine which rows have a flight leg that is connected to either my origin or destination. I’ll drag these onto Rows to visualise my data.

As you can see, every single origin city is listed with every destination city that you can fly to from that origin city.

To identify connected flight (legs), I’ll create two calculated measures.

Flight – Connected to Origin:

Flight – Connected to Destination:

What these measures give us is an indication of whether a row has a flight leg that involves either our origin or destination city.

Note how this gives us the name of the city if it’s connected to our origin (Brisbane, if you remember Brisbane -> Singapore was set with my parameters). If will also show Brisbane as being connected to our destination, Singapore, but Null for all other locations because those are not our destination.

STEP 3 – Creating our sets

Remember how I said we want to avoid joins? The question then becomes “how are we going to compare values from one row to values in another row?”. The answer is: we’re going to create sets that include all those cities listed in our new columns.

I right-click on my ‘Flight – Connected to Origin’ measure and select Create -> Set.

What I want is all values except the Nulls.

I then repeat the same process to create ‘Flight – Connected to Destination Set’ from my ‘Flight – Connected to Destination’ measure.

To illustrate what’s happening, I’ll drag these sets to my Rows as well:

These sets indicate whether our Flight Origin or Flight Destination are in the set of cities that satisfy our criteria of being connected to the origin or destination. The 5th row indicates a direct flight from Brisbane to Singapore, and all other rows show flights from Brisbane to somewhere else…so let’s take a look at the rows in our data around one of these ‘other’ locations, Sydney:

Note that the last column is indicating that Sydney has a flight connection to Singapore. This is something we want to remember when looking at all possible indirect flights from Brisbane to Singapore. I’ll create a new calculated measure for this.

Flight – Connected to Both:

This is a crucial part. I’m basically saying “hey, if you’re a destination from Brisbane AND you appear in that set I made of all cities that have flights going to Singapore, THEN I want to include you”.

And now my data looks like this:

I’ve identified that Seoul, Sydney, and Taipei are all stopover locations. Note that the Singapore row itself is null – this is OK because we’re only identifying stopover locations in this column. Identifying direct flights is absolutely no problem because that’s what the original table already tells us.

STEP 4 – Putting it all together

Finally, what we want to do is identify all flights that we’re interested in, direct and indirect, and display them with some emphasis on the direct flight path.

We’re looking for all rows that satisfy one of the following conditions, given a flight path of Brisbane -> X -> Singapore:

  • Brisbane -> Singapore
  • Brisbane -> X
  • X -> Singapore

I’ll create a new measure to account for these 3 cases:

Flight – Legs:

I’ll also make this a dimension and drag it to rows:

Now I’ve got a count of flights, but all I really needed to know if whether the value is above 0.

Finally, I’ll add a flag to indicate whether the flight is direct or now (you can do this in the previous step’s calculation as well, but I’m separating this step out here to make things clearer).

Flight – Is Direct?:

Now, let’s recreate the flight path map by adding ‘Flight – Legs’ to the filter, only including those rows with values above 0:

And I’ll also drag ‘Flight – Is Direct?’ onto Colour so that I can highlight the direct flight path.

Great! I can now easily visualise flight paths between two locations, including (single) stopovers, and avoiding any unnecessary AND expensive joins in my data!

How about Singapore to London?

Or Los Angeles to New York?

It is indeed showing me all relevant flight paths. Notice how some flights are going out of the US then coming back? That’s because we haven’t placed any restrictions on distances – so it’s showing us that you could fly from Los Angeles to somewhere in Europe or Asia, then fly back, direct, to New York..if you wanted to.

There are a number of things we could do to flesh this out into a nice Trip Planner dashboard – we could place restrictions on the distance of the stopover from our destination, add conditions on the times of connecting flights, and separate out individual flights from individual airports in the same city. For now, we have sets doing what we want – identifying which rows/flight paths with an origin and destination have a transitive relationships and are connected by a common stopover city.

Sets really can open up a whole lot of possibility with data. The same ideas here could be applied to a number of other data sets, like employee tables where each row denotes an employee and supervisor. The transitive relationship would be to identify an employee’s boss’ boss, or all the subordinates two levels down to a manager.


The Data School
Author: The Data School