This Blog I will be writing about a feature of Tableau Prep Builder that I really like using, “Grouping”. I’ve grown to appreciate how it is done in Prep Builder after getting frustrated using other methods. To clarify, grouping is collecting values of a dimension field, and grouping them together into a broader category. This can be a hierarchical thing, like grouping together all States within a Country, or it can be to create simplicity.

I stumbled across this helpful feature when working on joining two data sets that contain some characteristic information on DC and Marvel comic books characters. This data set contained a list of all characters, number of appearances, gender, hair color, eye color and a few other things. I noticed that a few of these features had variations in spelling of the same category, or very few records of a minor category. For example, there would be characters with auburn hair, and characters with orange hair, red hair, Red hair, strawberry blonde, Reddish Blond….

For me to present hair color as it is, as a dimension in my visualization, it would not look good. My default solution to this kind of problem would be a multitude of CASE or IF statements. These can be time consuming to write and require checking back and forth and re-reading. They also can grow very complicated very quickly.

What I noticed in Prep Builder in the view pane double clicking on any value (or any amount of values) allows you to re-name it (or them). From this view, you can also see the number of records each group contains, allowing you to better select more appropriate groupings. For many of the hair colors, I grouped them into an ‘Other’ category, as there were only a few entries for each one.

Another great thing of grouping in Prep is the ability to compare before and after, each step in the workflow shows what has been changed, meaning I (or someone else) can go back and edit these groupings. It also allows it to be re-usable, and distributive to others using similar data sets.

There is also the option to get Prep Builder to make the Groups for you, it can detect Common Characters, Pronunciation and Spelling similarities. I didn’t find these tools to be very useful for this particular data set, but I recommend going into Prep and testing these tools out.

I encourage everyone to have a go at cleaning your dimensions this way in Prep Builder, see if you like it!

Visit my Viz on Tableau public, that compares diversity of Marvel and DC comic book characters across my new Groupings. (Hot topic, I know)

Data Set sourced from:

https://www.kaggle.com/fivethirtyeight/fivethirtyeight-comic-characters-dataset