It is already Day 2 of Dashboard Week. Day 1 of dashboard week (yesterday) wasn’t too bad. We started the day by presenting our results of yesterday’s work from day 1. Check out my dashboard from Day 1 here if you haven’t already. It was exciting to see what the cohort had come up with in the short amount of time we had yesterday using the same data. I learnt a few things about anime on the way.
Data
After our presentations, we were provided with a new data set to work on. The data was on the top trending videos on YouTube from 11 countries over the last year and a half. The data set was split up into 11 files (one for each country). The combined data was quite large at over 1 million rows. There should be enough data to tell a good story, but we were instructed to add more data if we wanted.
Limitations
Some of the limitations we faced included that the data included titles and descriptions in different languages which limited our analysis. I would’ve liked to translate these titles into English. However, it was infeasible given the time we had and the constraints of the Google Translate limits on API calls. The data was also very large which led to time wasted waiting for data to process.
Data Exploration
Finding a good story took much longer than expected today. It took a while to process the data as there was a large volume of it. I spent half the day trying to understand the data and looking for a good story to tell in my dashboard. The story I wanted to tell was around how the categories of top tending YouTube videos change over time. The dashboard also allows the user to explore some trending categories and channels and find videos they may be interested in.
Alteryx
My Alteryx workflow was quite simple today. There are also a few things I calculated but didn’t use in my final visualisation but it was good to practice some formulas and regex anyway.
My workflow consists of using a wildcard input to union all the countries into one workflow. The country code was then extracted from the file name. The YouTube URL for each video was constructed using the ID. Next, the dates were then converted to dates and the time it took for the video to trend was calculated. There were a few calculations I added such as the number of capital letters used in the title (could be used to detect clickbait titles) as well as the like and dislike ratio were calculated. The category name was then joined using the category id, followed by filtering out any duplicate rows.
Excel has a limit of the number of rows which is a little over 1 million, while my final output was around 1.2 million. My workaround was to use the sample tool to split the output into 2 excel files which would be unioned in Tableau.
Alteryx Workflow
Tableau
My dashboard is themed around the red branding of the YouTube logo. On the top row, we have the average time it takes for a video to trend and the average number of views, likes and comments overall.
We then have a map that can show these averages for each country coloured in a shade of red. The darker the red, the quicker a video trends in that country or the higher number of average views/likes/comments. This map also acts as a filter to filter the rest of the views.
The second visualisation is a bump chart showing the ranking of the most popular category for each month. Clicking on a line will filter the following views with videos for that specific category.
On the bottom left, the third visualisation is a word cloud, which is something new I learnt today. The larger and more red a channel appears, the more videos they have trending. Clicking on a channel will filter the final view which lists each of their trending videos.
By clicking on a video title, a hidden Web Page object which loads the URL of the video. Clicking “Show Video” will enable you to watch the video.
Tableau Dashboard
Embedded YouTube Video
The result of day 2 of Dashboard Week can be found on Tableau Public here.
Reflections
As I mentioned in my previous blog post, Dashboard Week is a great way to revisit some things you have forgotten about during training. In Day 2, I revised how to do a bump chart, refreshed myself with set actions and learnt how to do a word cloud. I would’ve liked to manage my time better today since the data was quite large and it would’ve been beneficial to focus on a narrower time period or one country to speed things up.
Edit: Check out my blog post for Dashboard Week, Day 3!