Alteryx is a powerful data prep and analysis tool that can process large amounts of data extremely fast. However when dealing with very large amounts of data it can be frustrating when the workflow is not performing as well as it could be. Here are 7 simple tips for optimizing the performance of your workflow.
1. Change the file input type
Different file types will take different lengths of time to read. You can speed up the processing of your workflow by converting to a more efficient file type like an Alteryx database file (.yxdb). The Alteryx database format is the most efficient file type for reading and writing in Alteryx because it has no size limit, is compressed for maximum speed and includes additional metadata that references the source of the data and how the data was created. Click here to see Alteryx documentation regarding file types for more information.
You can convert a data source using a simple work flow with an input data and output data tool:
2. Change the workflow runtime settings
Changing the runtime settings in the workflow configuration can help improve its performance. For example while you are still in the testing phase, limiting the records for all inputs or disabling tools that write output can increase the processing speed.
3. Reduce number of fields with the Select tool
The less data Alteryx needs to process, the fast it will run. It is a good idea to remove fields you won’t need as early as possible by using the Select tool.
4. Take a sample of the data
Another way to reduce the amount of data Alteryx needs to process is to use the sampling tool. Sampling only a selection of rows while testing your workflow can reduce the processing time. It is possible for example to sample an N number of the first or last rows of the data set. However it is usually better practice to choose a random sample to have a better representation of the data.
5. Use the AutoField Tool to find the best data type
The Auto Field tool can help you quickly find the most efficient field type when importing data. Reducing the number of bytes for each field can potentially improve the performance of your workflow. However every time it is run it requires a certain amount of data before it can calculate the appropriate field type. It can be therefore beneficial to remove the Auto Field tool once you have set the field file types.
This can be done by inserting a Select tool after the Auto Field Tool and saving the file type configuration as an Alteryx File Type (.yxft). You can then right-click and “Delete and connect around” the Auto Field tool and go back to the select tool and “Load Field Names & Types” and voila! All your fields will have the specified type.
6. Disable tool containers
Reducing the number of tools that are run each time will speed up your run time. Tool containers can be used to group tools together and to document the process of your workflow. Then groups can then be disabled from running by choosing “Disable container” in the upper left corner of the Tool Container as seen below:
.7. Save a cache of parts of your workflow
Finally another way to disable some parts of the workflow from running each time is to select the ‘Cache and Run Workflow’ option on a specific tool. This will save a temporary copy of the results so the tool and all tools from before it does not need to reprocess every time you click run.
Keeping these simple tips in mind can go a long way in helping you keep your sanity when working with large data sets!