Its official, DSAU19 have completed 25% of their training, and what better way to celebrate the occasion than learning about regex, macros and API’s in Alteryx. This week was a big week. On Monday (week starting 27/2) we covered Alteryx reporting, and analytics apps. Then Tuesday we covered macros including: standard macros, batch macros, and iterative macros. Wednesday we did regex, and then on Thursday we covered API’s. And of course Friday was our challenge day which we will get into a bit later. As you can see we did a lot and we were exhausted by the end of it. So, lets dig a bit deeper and see what macros, API’s and regex is all about.

Macros

Macros was probably the hardest concept to get my head around. Partially due to the different flavours it comes in, which include: standard macro, batch macro, and an iterative macro. A macro is essentially creating a tool, made from existing tools that is publicly available. So, if you wanted to web scrape data from a website and it had a specific format, it is possible to build a macro that would parse out the necessary data using other tools that are available. An example of this is the ‘Data Cleansing’ tool. So, the data cleansing tool is actually a macro. In other words, a user created a workflow with pre-existing tools that automates the task of removing nulls, whitespaces, and capitalising letters.

As mentioned before, there are three type of macros. A standard macro, like any macro, is a reusable tool that performs specific functions. They have well defined inputs, outputs, and error handling. They can also be shared on the Alteryx gallery or shared network folders. Batch macros on the other hand allows you to process multiple input files as batches. This is very useful for tasks that require the same kind of operations on multiple inputs. As a result, it is very useful for handling data that comes paginated from API’s. Lastly, iterative macros, allows the user to apply a set of operations repeatedly until a condition is met. As a result, the result of one iteration feeds into the next until the condition has been satisfied. Overall, macros are vey useful tools, while they maybe difficult to understand they become very powerful.

Regex

Its all good and well that data can be downloaded from various websites and databases, its a completely different story when it comes to manipulating the data to look the way you want it to. This is where regex comes in. Regex, short for Regular Expression, is a sequence of characters that form a search pattern. Its a very powerful and dynamic tool for pattern matching and text processing. The origins of regex can be traced back to the 1950s and 1960s as a formal language for describing regular sets of strings. Computer scientist Ken Thompson implemented regular expressions in the QED text editor in the 1960s, and regular expressions were further developed and standardized in the 1970s, with the publication of “The Art of Computer Programming” by Donald Knuth and the POSIX standard. In the 1980s and 1990s, regular expressions became increasingly popular in programming and text processing.

Now, lets go through an example of regex and what it actually does. For example I have the following string:

timble.michael1997@thedataschool.com.au

And now, I want to grab the first name and the last name (timble michael). I can use regex to extract exactly what I want. The regex code in this instance would be something like this:

^([a-z]+)\.([a-z]+)

So, it might look like nonsense however, if in any given dataset every email had the same format as above with the code above, I can extract the first name and the last name. Let’s breakdown the code. The ^ (carrot symbol) denotes the start of the string. Then the normal brackets (), indicate a capture group. Meaning, whatever is inside the brackets, I want to keep. In this instance I want to keep ‘timble’ and we used the OR operator in the form of the square brackets []. Inside the square brackets are contain alphabets ranging from a-z. So, give me an a or b or c, 1 or more times, which is denoted by the + symbol. Then followed by an escape character \ (back-slash) then a period . then the same pattern again. Which will give us timble michael.

As you can see just some basic regex can save a lot of time when it comes to data transformation.

API

The final topic of the week was API. API or Application Program Interface, is a set of protocols, routines, and tools that allow different software applications to communicate with each other. In its most simplest form an API is used to fetch data. Lets illustrate this with a restaurant analogy. You as a customer want food (data) but you need to contact the kitchen (backend database) the only way you can do that is through the waiter (API). As a result, you tell the waiter what you want and it will contact the kitchen and fetch it for you. The API is the middle man between you and the backend server were the data is stored. And you can define what data you want and what you want to do with it. For example, you are able to use a GET request to get data, POST request to post the data, the DELETE request to delete it. Any social media app or anything on the web or phone you interact with is predicated on a functioning API. So, lets put all this together for our Friday challenge.

Friday has Arrived

As you are all aware now Friday is always a big day for us, and this Friday was no different. Our challenge, either web scrape data or get information from an API, clean the data, and then present it in a dashboard. For this challenge I used US census data API to find data on population change between census dates. I created a batch macro as the data was split across 12-API endpoint calls. My particular dataset I had create a batch macro to grab the data form the 12-API endpoints, and then use regex to clean the data. And this is he batch macro that was used:

Figure 1: Batch macro for the API call.

In creating a batch macro, I had already created the text input with the 12-API URL’s that were necessary. Then the download tool was configured. After, a control parameter was used to change the number. The number in this context is a parameter that changes, and fetches data for the corresponding year. Once the macro is configured, it can be used in the original workflow.

That was it. Officially, we are 25% of the way through our 4-month training. It is unbelievable how quickly it went, and I believe it will only get faster.

The Data School
Author: The Data School