Hey everyone, for day two of dashboard week, the team and I are using data from National UFO Reporting Center (NUFORC). However, the data isn’t nicely supplied unlike on day one. In order to use NUFORC’s data, the first thing I did was web scraping. In total, I ended up with 3 different Alteryx workflows which I will provide a brief explanation for 2 of them. And of course, the output of these workflows was used in my dashboard.
The first workflow extracts the State information from NUFORC. To start off I insert the URL into a text input that looks like this.
The URL link feeds into the Download tool allowing us to extract the source of the webpage which is currently in an unusable state. However, this can be solved by using the regex tool to parse specific segments.
After parsing through the regex tools the new columns appear nicely.
The main I am interested in is RegExOut1. The column provides information that can be combined with the original link ‘https://nuforc.org/webreports/’ to then be web scraped again. For the first line, we can combine ‘ndxlAL.html’ to the end of ‘https://nuforc.org/webreports/’. Part 3 produces https://nuforc.org/webreports/ndxlAL.html. This newly formed URL can is fed into a similar workflow.
The second workflow is essentially a repeat of workflow one but extracts row-level information. For step 3 I simply converted any string fields that could be dates into the date data type. Step 4, cleans up columns that would be outputted and used in Tableau. And finally, in step 5 we output an excel sheet again.
This was my workflow for day 2 of dashboard week. You can find my dashboard here and thanks for reading, have a good day.