For day two of dashboard week we were given a task to pull from NSW transport API. Specifically we were to pull timetable information and real time delay data for Sydney trains. The API was relatively simple to get going. Just create an account. Add the API endpoints that we needed and use their web application to build a query. This then could be transferred over to the Alteryx download tool and would pull the real time data about the Sydney trains. However once we had it in Alteryx there was a big problem with it. The output from their API was not in an easily read format. It was similar to JSON format but not quiet close enough for it to be easily passed by the parser tool.

Cleaning in Alteryx

Instead I had to break apart the structure myself. This was quiet the challenge, as it had a nested data structure similar to JSON. A trick i found quiet useful for separating out the different block of data was to using the split to rows tool and selecting ignore delimiters inside parentheses. This allowed me to get each data block on it own row making it much easier to deal with. From there it was repeating that step. Breaking the nested data blocks apart. Once all the data was extracted, it was all joined back together to get the delay information about each stop.


Cleaning the data took most of the time for this task. For the time I had left I mad the above dashboard which can be found here. It allows you to select a route, a line then a stop. From there it produces a timetable for that stop. Showing any delays.