Using Alteryx To Decide Whether I Should Blog or Procrastinate…

As the second week of my data training rushes by it is high time that I use some of my newfound skills in Alteryx and Tableau to contribute to the Data School’s blog. But alas! I can feel myself beginning to procrastinate. Perhaps in times like these I should turn to my predecessors, and in true Data School fashion, use data to analyse the blogging patterns of past cohorts.

Luckily my one week of Alteryx training has prepared me for this task.

Web Scraping in Alteryx

In order to gather all the blogs I needed from The Data School’s website, I used Alteryx to generate URLs that iterate over each of the Data School’s blog menus and downloaded the raw HTML information. Then I split that HTML into rows and filtered only the lines that could contain information I was interested in, making it easier to parse later. I used regex to get the author names and write dates of blog posts. I joined that data with another dataset I created containing cohort numbers. Finally, I used the DateTime, Summarize and Formula tools to calculate the number of days since a cohort’s earliest blog post. This way the blog dates could be compared across cohorts.

The Results!

Above you can see the stark results of my research; blogging is reasonably frequent in the first week perhaps motivated by our newbie’s enthusiasm, but post numbers trail off going into the middle weeks as data schoolers become busier. At week 1, the average data schooler writes 0.7 blogs per week. But by week 7 they write around 0.4 blogs per week! Late in the course is a sharp peak around 13 and 14 weeks with an average 2 blogs per person per week.

I attribute this late-course spike to two factors. The first is the panic of having to make up for the weeks 3-12 blogging slump as training draws to an end. Secondly, and perhaps more importantly, is Dashboard Week, in which data schoolers must create a dashboard per day – and most use this as a prime opportunity for blogging content. Thus, dashboard week, which occurs towards the end of training, dramatically inflates the blogging frequency.

For me, then, the question of whether I should blog or procrastinate has a sample answer – blog now, for I will have less time to blog later. Luckily, I have just the topic to write about…

The Data School
Author: The Data School