We have survived Day 1 of the dashboard week. For Day 2 we were given the data on the videos that were trending on YouTube.
There were lots of files in the csv and JSON formats. I used Alteryx to combine all the csv files using wildcard input first, then did the same with the json files to get all the category details and joined them together. Next, I cleaned and transformed the data to restructure it into the format I needed (see the workflow below).
The trickiest part today was to understand the data. A video could have been trending on different days in multiple countries which means there were multiples rows for one unique video. It was important to make sure I was using only one row per unique video to avoid duplicates. I used the summarise tool to get the data I needed into hyper file format.
I looked at how videos in different categories trended on YouTube by looking at the total views per video.
The category that has the highest views was Entertainment, which was followed by the Music category. Gaming category had the highest like to dislike ratio.