The text to columns tool in Alteryx can sometimes be a little clunky. It’s a quick fix when you need each element of ‘Element1/Element2/Element3’ to be in it’s own column, but what if the string was ‘Element1/ Element2_ Element3 ’? This requires multiple text to columns tools as well as the Data Cleansing tool.
In an effort to practice regex, I’ve recently begun using the Regex_replace function via the Formula tool whenever there’s a need to extract multiple parts from a string. It’s not easier nor faster than Text to Columns, but it sure is fun.
The Regex_replace function
The Regex_replace function requires 3 parameters: Regex_replace([A string field], ‘the regex pattern you want to match’, ‘the string you want to replace it with’). Keep in mind when using this function that all instances of the match are replaced, not just the first.
I’ll show you how I use this function to split text to columns with a worked example. I’ve used a text input containing the names of the only sporting team I know. It also contains the year they joined the team (not canon) and their position:
The formatting here is messy and the elements I’m interested in (Name, Date Joined and Position) are separated by different characters. Using a text to columns tool to clean this up would look like so:
But there’s a cool way we can do this with the Regex_replace function in 2 tools instead of 6:
Here I matched the pattern of my string field that contains the wanted elements grouped by parentheses: ‘(\w+)_\s*(\d+).*/(\w*)’. Grouping like this is important because I want each of these elements in a new column. The replacement parameter of the Regex_replace function can be a plain string or, more usefully, can point to a group in the matching pattern. To point to a specific group simply prefix the group number with a dollar sign. The group number refers to the order that each group appears in the matching pattern.
The full function syntax to extract the player names into a new column then becomes:
Regex_replace([Text Input], ‘(\w+)_\s*(\d+).*/(\w*)’, ‘$1’)
To get the date joined and position in their own columns, a new column for each is added in the formula tool. The same code is used however ‘$1’ becomes ‘$2’ or ‘$3’. I’ve also used Regex_replace instead of the date time tool to clean up the Date Joined field. A quick select tool at the end fixes up the column headers.