During last week’s client project, I had the chance to work with Alteryx to solve some survey data problems. One thing that I learned was the summarize tool is very useful to check results, especially making sure joins are correctly performed.
Taking the superstore dataset as an example (from Tableau), below workflow seems to work fine. However, it is not the case. After the join (on Order ID), there are 3226 matched records and 9194 unmatched records. Experienced data analyst would recognize the problem straight away, that is ,3226+ 9194 does not equal to 9994 (from primary data source records). The problem might be because for one order ID, there are multiple returns in the Return dataset.
Let’s use a summarize tool configured below to confirm my guess.
Looking at the result from the summarize tool, we can tell that indeed, there are multiple records for one order ID from the inner-joined data.
One way to fixe this problem is to add a summarize tool after the return dataset.
Below configuration allows me to find out the total returned items by counting the records.
Keep everything else the same, we now have only 800 records coming out of the inner join. If we look at the records, they also confirm the join was performed properly, 800 (inner join) + 9194 (outer right join) = 9994 (total number of primary records before join).
The conclusion is that we should always be extra careful when use a join tool, checking the number of records coming out of each side and using summarize tool is a good way to make sure data are correctly joined.
The summarize tool is more than just aggregate value, it also can help with counting records, it can also perform calculations such as NPV, Standard Deviation and Variation. For more what Summarize tool can perform, you can click here.