Why do we need LOD Expressions
When working in Tableau, sometimes we need to answer questions that involve aggregating our data to different levels of detail (LOD). For example, maybe you are interested in finding out which state has the highest average sales across different cities within that given state. Or you would like to perform a cohort analysis and see if longer tenured customers contribute more for each year’s sales. Questions like these are easy to ask, but can be very tricky or even impossible to answer without knowledge of LOD Expressions.
LOD Expressions are a powerful and efficient way to solve problems involving multiple levels of aggregations in one single visualization. However, because of the nature of LOD Expressions, it can be especially challenging to get right. In fact, it is probably one of the most confusing concepts to master if you are just starting your Tableau journey. Based on my personal experience, the reason why many people get confused about LOD Expressions is they haven’t understood the fundamental knowledge behind LOD. To be specific, the level of detail of the data and of the view. In this blog, I will try to clarify these concepts to help you get a stronger handle on LOD Expressions.
Aggregation and granularity of Data
Whenever we begin to work on a specific data set, it’s always a good practice to first understand what the level of grain is for this data. Do you know what each row in the Tableau Sample Superstore data set represents? Does it represent a unique customer, a product, or something else? The closest answer, ignoring some potential duplicates, is actually a unique product within a specific order. The following screenshot gives you an idea how I arrived at this conclusion. You can replicate this view using the aforementioned data set in Tableau version 2020.2.
As shown in the image, the data set contains 9994 records. By concatenating Order ID and Customer Name, I was able to create a nearly unique key that identifies each row in the whole data set (9986 records).
Understanding row-level detail is fundamental since all of our analysis will be performed either at the same level of this or at a higher level, i.e., aggregating. A simple example would be analysing total sales for each region. In this case, region is at a much higher level of grain, so sales values need to be aggregated. Other forms of aggregation include averaging the discount rate, counting the number of orders, etc., as shown below.
Viz Level of Detail
The second important thing to keep in mind is the level of detail on your current viz, and you should also be aware what actions can alter this level of detail. Here is a screenshot taken from Tableau LOD Expressions Whitepaper that clearly explains it.
Basically, your viz level of detail is determined by whatever you put on the Canvas, Row/Column shelf, or any of the card on Marks except for Tooltip. To test your knowledge of this, imagine I have a filled map for every State in the US, what do you think will happen if I put City onto the Detail Marks card? Will I still be able to preserve the filled map view?
The answer is no, since the viz level of detail will have been altered. Specifically, the map will show a more granular level of information. In this case, the filled state map will turn into circles representing different cities, as shown below.
For the same reason, if you create a sales bar chart for each State, and then drag City onto your view, you will realize your view suddenly becomes fragmented. In other words, each bar has been broken down into smaller bars, each of which represents a City within that State. I used to get very confused about why this happens and became very annoyed when I couldn’t preserve the unbroken bar. I believe many of you who are reading this article also had similar experiences. This all highlights the importance of understanding viz level of detail for eliminating lots of confusion while working in Tableau.
Types of LOD Expressions
Once you have understood the grain of your data and what affects viz level of detail, you will be well poised to wield the power of LOD Expressions. In general, there are three types of LODs: Fixed, Include, and Exclude. And their syntax in Tableau follow the same pattern. You start by specifying the LOD keyword, followed by dimension(s) and a colon, and then specify how you would like to aggregate the measures. And the entire expression has to be enclosed in curly brackets. Here is an example calculating the total sales fixed for each region:
Include, as the name suggests, allows you to create calculations using additional dimensions than what’s already in you view, but without actually putting those dimensions on the view. In other words, it looks at information from a more granular level, and then aggregates that information to be visualized in your view. The following diagram, also from the Whitepaper, compares the hierarchy of granularity between the two levels.
Exclude, on the other hand, allows you to calculate at a higher level of detail than your current view, by disregarding some of the dimensions already on the viz. Because the information from a higher level is more aggregated, it will be replicated for every combination of dimensions on your current view. For example, in the below viz, I am showing total sales for each City within a State, along with total sales for that State by using the Exclude LOD Expression.
For each State, the sales figure has been replicated because it is from a higher level of view. By the way, whether it is level of view, level of aggregation or granularity, they basically mean the same thing so I have used them interchangeably.
One very important concept to note is both Include and Exclude expressions are dependent on the dimensions already on your current view. If you Include a dimension that’s already on the view or Exclude a dimension that’s not on the view, then you will get the same results as if you haven’t used the LOD Expressions. Also, results from Include/Exclude LODs can only be used as measures.
But the most powerful and most frequently used LOD Expression is actually Fixed, with the main reason being it is independent of the dimensions already in the view. Thus, you can calculate information for any combination of dimensions and then visualize that information in the viz. Fixed LOD allows you to easily move up and down the granularity hierarchy. In addition, results from Fixed LOD can be used both as a measure or a dimension.
Another major distinction between Fixed and Include/Exclude is where they sit on Tableau’s Order of Operations, as shown in the image below. This matters significantly because many of our actions while creating a viz involve applying filters and these filters get executed according to Tableau’s Order of Operations. Fixed LOD sits on top of Dimension filters, whereas Include/Exclude LODs apply after Dimension filters. Thus, if you need to use dimensions as filters but don’t want your results being affected, you will have to use Fixed LOD.
Because of these advantages, whenever I use LOD Expressions, it’s almost always going to be Fixed. Also given its flexibility, Fixed LOD has opened up tremendous interesting use cases, such as performing cohort analysis, comparing sales information with a specific dimension, analyzing customer order frequency, etc. In my next blog, we will jump into Tableau and go through some hands-on exercises to reinforce our understanding of LOD Expressions.