During Teaching Week at Data School AU, we were tasked with preparing for and conducting a teaching session. One of the exercises was based on week 35’s Workout Wednesday Challenge by Tableau Zen Master, Rosario Guana. Whilst developing the solution, I discovered a way to “duplicate” data in Tableau without actually duplicating the underlying data. This technique can be useful in avoiding duplicating large data sets. I have attached below a screenshot of my Viz. If you are familiar with the Tableau Superstore Data set, you would notice that the category and sub-category is referenced in the same column. Read more below on how I managed to “trick” Tableau into doing this!
In this blog, I will provide a brief overview of the “duplication” technique and ways to check for potential problems when “duplicating” data. This is best used for working out values at a coarser level of detail than the available data set. In other words, the data set must already be more granular than the intended results or calculation.
Splitting the Data Set
To start, I introduce a new calculation to separate the data into two streams. One easy way to do this is to split the data based on time or date.
This calculation returns the remainder of day (in the month) divided by 2 and then labels it odd or even accordingly. Now that we have split the data, we can find interesting use cases for this. For example, using the Superstore Tableau data set, we can combine the “Category” and “Sub-Category” Fields into one column:
The red highlighted area shows the combined field. Note that this is usually impossible without duplicating the data. This is because each order in the Superstore Tableau data set belong to both a Category and Sub-Category. There are other ways to split the data depending on dashboard presentation requirements. Although, it is important to always check that the split data can work correctly with the LOD calculations I mention later in this blog post.
Checking for Errors after Splitting the Data Set
Depending on the data set, there could be dates where a certain field value is missing. To check that we have AT LEAST ONE odd and even day for each sub category AND for each year, we can use a simple cross tab in Tableau:
The cross tab tells us that for Accessories (Sub-Category) in the Technology Category, there is at least one odd and one even row in the data set for each year. This step is important to ensure that both even or odd days are referencing the correct Sub-Categories. In other words, we want Accessories as a field value in both even and odd days. The reasoning for this will be apparent in the next section.
Level of Detail Calculation
With this new combined field, I have created the following chart that shows Sales and Profit in the Technology Category and the corresponding Sub-Categories. I have also included an example of the FIXED LOD calculation.
I used both FIXED LOD and the earlier calculation for odd and even dates. With these, I am able to display profits and sales for Technology orders and its Sub-Categories. The technique outlined here is how I managed to complete the Workout Wednesday challenge without duplicating the data set.
Thanks for reading this blog post! I hope that you manage to find great use cases for this.
Also, Check out Rosario Guana’s blog and her original solution to Workout Wednesday week 35, which was where I first learnt of this technique.