For this blog post I decided to use the information available to me on the UK and Australian Data School Websites. I wondered how employers and partners of the Data School view and select people to work for them. I decided to make a Viz to make it easier to view everyone in the data school, and for it to update dynamically as new people are added. First I had to get the website links:

Download and Parse

Next using the download tool I got the embedded web site data. I originally tried to parse it using a comma delimination, however, I ran into massive issues with newlines and regex not completely working, so thinking what  I could parse, I saw an < and decided to see if that worked, to my delight, I now had every piece of information parsed with regex correctly working with it.

The regex code to parse out the Australian Data Schoolers looked like this

\Qa href=”\E.+\Q/”>\E(.+)

\Q and \E means that everything in between is not to be counted as regex, it’s a helpful workaround when data has slashes and other things that could be interpreted by Alteryx as Regex. Essentially, I collected everyone’s names from this, and providing the format doesn’t change in the future, it should work for newcomers too.

The British Data Schoolers names were a bit trickier to try to obtain, as using the same method above doesn’t work correctly, for example for O’Brien it would give me something bizarre: O&#8217;Brien. I had issues with many more people’s names too. So I had to approach it a different way. I used the names embedded in the Pictures instead. I couldn’t use this method above as I didn’t see any names embedded in picture for the Australian Data School, perhaps because when you try to click one of our (Australian) Data School pictures under the team tab, and keep clicking it takes you to our blogs, but this is not possible for the British ones and does not allow you to do this.

However, some of you will notice the pipe delimiter below indicating an AND in Regex. This is because some pictures where JPEGs and others were PNGs, so I had to account for both situations.

  \Q.\Ejpg\Q” alt=”\E.+\Q” \Etitle=”(.+)\Q” />\E|\Q.\Epng\Q” alt=”\E.+\Q” \Etitle=”(.+)\Q” />\E

As a result of parsing two objects I had two output fields, I used a multi field formula to make sure both names appeared in both columns.


Then with both Alteryx streams, I joined the names and cohort number (Obtained through regex (DS\d+)). But again, another problem, how to match the name to the cohort number? My line of attack was to use a multi-row with the formula:


if isnull([Row+1:DS#]) then [Row-1:DS#] else [Row+1:DS#] endif

This ensured that cohort numbers repeated all the way down until it encroached another cohort value, then it would repeat that. I then used a unique tool to get values for the names and cohort. After that, I used a filter to get rid of null values that resulted from the unique value of the null name and unique cohort number. Lastly, I joined both data streams together and outputted to Tableau

The final product can be seen here

UPDATE: I hard coded a formula to both data streams as certain peoples names weren’t displaying when clicked on in Tableau, so I changed all apostrophes to no character and removed brackets and last names that weren’t in the URL

The yxdb can be seen below

Names of Data Schoolers