Alteryx works in mysterious ways, and sometimes when you get not the results you expected

You can get totally lost on how and why this happened. These are the five first things to check across your workflow that can help you to find the issue.

In short

  1. Check/set datatypes
  2. Field Names
  3. Nulls
  4. Duplicates
  5. Joins and Filters

 

And in detail

1. Check/set datatypes

Your workflow may stop working because data types are not set upstream and change unexpectedly. While most tools maintain the datatypes, it is not always the case. For instance, depending on your data source type, your data ingestion tool can try to infer the data types for the columns depending on their respective content, and if the data is dynamic, the inference may change. Some excessively smart macros also can change expected data types. So check that every tool gets expected data types as input. Also, it is a good practice to place a Select tool at the very beginning of your workflow and define your field properties explicitly. 

2. Check field names

Again, Alteryx field names might look not as expected after some transformations due to Alteryx trying to do “smart” things or just due to limitations in the software. I.e. full stops are substituted with underscores etc. It is usually easy to catch this kind of error, but it is not always the case. Things to look at: 1) letter cases (if you make a typo in a Select tool and use a capital letter instead of lower case, you will create a new field and mess up your calculations, joins and unions), 2) spaces in field names – trailing spaces in field names are especially frustrating, so you might use a dynamic rename tool at the start of your workflow just to make sure you never experience that. Again, this change might happen when you do not expect it, for example, if you switch data sources (i.e. from dev to the testing environment), so be aware. 

3. Nulls

Having null cells in your data may massively throw off your data processing and analysis. Using fields with null values in joins is a road to disaster. While it is not required to get rid of all nulls (they might be informative and relevant for your analysis), you must be aware if the column of your interest contains nulls and how it affects your downstream processing. Null might be generated as a by-product of your data processing steps, so if there are no nulls in your data source, it is not guaranteed that they will not appear later. 

While you can get stats on nulls from a Browse tool or the specialised data exploration tools, I prefer to drop a summary tool where I need it the most (i.e. before a Join tool), and use “count null”, “count not null” options – you can also use it to check the number of unique records (see below) and do other useful checks while debugging. 

4. Duplicates

Duplicate records can (again) screw up your joins and mess up your aggregates. Again, while records with the same values can be genuine and relevant, very often they are a result of a glitch in your workflow or artifact of the data collection method you use (often seen in data scrapping, for instance). If you have duplicate records (drop a Unique tool into your flow and check its D output – D for ‘duplicates’), it is at the very least worth investigating. 

5. Joins and filters

As I said before, artifacts in your data processing can throw your joins (and filtering) well off. You can lose records or duplicate them, introducing errors in your analysis. The best starting point to understand what is going on and where your data goes is to check additional outputs of your join (L/R) and filter (F) tools – you will see your wrongly filtered out records there and be able to understand how to fix it. 

That’s all, folks, but you are welcome to add your go-to Alteryx troubleshooting method in the comments below!

Eugene Kutilov
Author: Eugene Kutilov

Eugene has a background in science with a master’s degree in quantum physics and decades-long experience in technology marketing. Eugene’s technology toolbox includes Alteryx, Tableau, PowerBI, SQL, Python, R, GCS. He is a Tableau Certified Data Analyst, Microsoft Certified PowerBI Data Analyst (Associate), and Certified Alteryx Designer Advanced Specialist. He is also a Certified Data Scientist by Datacamp.