During my recent investigation into the usage of my shared Netflix account, I’ve become quite intrigued by how often I have three or more users streaming Netflix content at the same time, as it could help me determine the most cost-efficient subscription plan that suits our usage habits.

In the process, I’ve had to tango with this monstrosity of a data type – datetime data, a bona fide nightmare that refuses to play nice in the data sandbox. While I have figured out how to plot time data throughout a day, and have even jury-rigged a dashboard that enables me to perform manual visual inspections for overlaps, I remain determined to get Tableau to flag these overlapping time spans automatically, if for nothing else, because I’ve already spent too much time on this to give up.

Approaching the end of my wits and pushing the limits of my abilities, I’ve finally hammered out a crude and rudimentary solution to my problem – as you might expect, elegance and datetime rarely share the same sentence. Crude in the sense that Tableau alone could not accomplish what I wanted, so I had to call on my trusty sidekick Alteryx to help me data-wrangle this beast.

Though Process

The logic of my process is simple. Plotting duration in Tableau is a cinch – simply drop in the [Start Time] and then extend the Gantt bar by a duration measure. However, the visualisation of duration created this way merely extends the start time, a facade with no actual timestamps behind it for further manipulation.

I had to go back to the source: enrich the dataset with more rows, bridging the gap between start time and end time. Now, we have data that, metaphorically speaking, is tangible and occupies the same ‘physical’ space along the time axis. By, well, ‘splitting the timelines’, so to speak, we now have discrete chunks of each user’s streaming session. These chunks can then be compared to see whether they fall in the same bucket of time.

Breaking time into 15 minute chunks

 

Alteryx Workflow

 

TL;DR: Bin your datetime; compare the bins

The workflow follows the general logic I described, with additional cleaning. I will go into a bit more details about the thought process behind time aggregation.

Turns out the best way to work with time data is to convert it into decimal numbers.

Breaking the steps down, I start by dissecting the time into hours and minutes (1). Next, I rounded the minutes up to the nearest 15-minute increment, expressed as a decimal fraction of an hour (e.g., 15 minutes becomes 0.25, 30 minutes becomes 0.5, and so forth) (2). Finally, I added these adjusted minutes to their corresponding Start and End hours, resulting in what you see in [StartHourAdjusted] and [EndHourAdjusted] (3).

Steps (2) and (3) can be simplified into the following formula.

 

The step after aggregation creates the [HoursActive] field – 0.25h increments between the adjusted Start Time and End Time. I used the Generate Rows tool for this process. Here you can see the outcome of this enrichment process. We’ve effectively doubled our data size with these additional increments. Most importantly, it equips us with a neatly binned data set, facilitating comparison of time spans between profiles.

 

Tableau

The dataset I will be feeding into Tableau looks like this. Start Time and End Time have been retained for tooltips and duration calculation, the latter of which you can read more about here.

 

We’ll be harnessing the power of the Gantt Bar again for this visualization. Kick things off by dropping our beefed-up data, [HoursActive], into Columns. Although I’ve pre-calculated the Duration using Alteryx, you can just punch in ‘0.25’ in the Size mark – remember, we’re rolling with 0.25h intervals.

As for my analysis, I’m trying to flag stream time overlaps between profiles on each days. Thus, I’ll be dropping my Date dimension and Name dimension into Rows.

 

Next on the agenda is to add flags to identify overlaps.

First, we want to be able to count the number of profiles using the account on each date at each time interval. Fix the LOD on Date and Hours.

 

Next up, we can create a flag for intervals with 3 or more concurrent users to see if it’s viable to downgrade to a Standard Plan (Netflix Standard Plan keeps a tight leash with only 2 devices allowed to stream simultaneously).

Drop that into the Colour mark, and voila!

And the crowd goes mild… Seems like our data subset isn’t exactly rife with overlaps.

Still, after a bit of sprucing up and some handy parameter actions, we’re now equipped to delve into various levels of time overlap in our data.

Overlap detection with a lower threshold

Final Thoughts

It’s probably glaringly obvious, but this method does involve some sacrifice in the precision department. Since we’re dealing with binned data, any rollover into the next interval will result in a new bin. This can sometimes play tricks on us, creating false positives in overlap detection.

11 Seconds to Bars

 

Sure, I could revisit the data and give the seconds field the boot, or double down on this process and craft even more precise increments. But I think I’ll drop the mic on this one and retreat back into the cozy confines of base-10 data.

You can access a copy of my workbook here.

 

The Data School
Author: The Data School