While working on a project over the last fortnight, I found that sometimes one of the easiest solutions may not be the most complicated.
I have developed a fondness to web scraping and API connectors on Alteryx, and while this is a technically advanced solution, sometimes implementing web scraping and API connectors may not allow for easy refactoring in the future.
So onto the topic for this blog, the download tool. This may not work in all situations, but when possible, it is a really quick and fast solution when the site you are viewing allows for it.
When will this work?
This will work when there is a download link on the web page you are looking at. This means that if the developer has already planned for this, most of your work has already been done! How great! This download link may look differently from website to website, but they generally look like this:
The website developer has already predetermined what your downloaded data would look like too, so keep in mind some data massaging may be needed for your own purposes.
Now let’s hop into Alteryx and see how this is achieved.
The first step is to obtain the download URL. You can achieve this by right clicking on the “Download” button or in this case, the “Download Results” button. Select “Copy link address” from the drop down menu.
If you can apply filters to the data from the website you are on before downloading the data, I would suggest that you have 2 tabs open with different filters applied, so you can view what the differences between each download URL are.
If you had filters or query strings, place them in a text input tool such as seen in the screenshot below. All examples are replaced with flavour text.
NOTE: Whack on a select tool after the text input to change the field size to something larger than what was defaulted by the text input tool. This will remove some hair pulling on your end.
You will then want to reconstruct the download URL including all the queries and filters. An example is provided below:
For this download method, you will also need to create a file path for each of the records. I have chosen to use relative paths, so these files will be saved in the same project folder as the workflow itself.
Now onto the download tool, you want to navigate to the bottom portion of the configuration screen “To a File”. Select “Filename from a Field” and choose the file path field you have created.
Lastly, run the workflow and once that is complete, navigate to the folder where the workflow is saved and your downloaded files will be sitting right there for you to view!
This method, for my project, was infinitely times easier than working on webscraping the data from scratch. The website did not have sufficient API documentation and webscraping with just the download tool was not easy either.
And there we have it! A simple yet time efficient solution!