Alteryx provides a streamlined and organised visual system for data processing. It is easy to present to non-software users, runs fast, and has clear tools that when combined can execute powerful data manipulation pipelines. Python is a versatile language that can perform any task – if you have a problem there is a 99% change someone has written a library to solve it. And if they haven’t, you can write it yourself! But often, when deciding on the best tool for any given job, the answer is both. Often the ease-of-access and productionable design of Alteryx is best combined with the flexibility of Python code. For these tasks there is the Alteryx Python tool. This blog will give you 5 tips that I wish I had known when I started running Python in Alteryx.

This bad boy can do a lot of damage!

Tip 1: Learn the tools

While this tip may seem obvious (particularly to python devotees), knowing how to code in standalone Python is not sufficient to running Python in Alteryx. Alteryx uses two specific tools that are also required: pandas and Jupyter Notebook. Anyone who has worked with data in Python has probably worked with pandas, although Alteryx developers trying out Python are likely less familiar. Pandas is not a fluffy animal, but rather a Python library that deals with column-row structured data much like Alteryx does. Any data passed from your Alteryx workflow into the Python tool takes the form of a pandas DataFrame – for this reason you have to be at least familiar with pandas to run the Alteryx Python tool. If you have never heard of pandas, there are great resources to learn it available online – here is just one example.

Jupyter Notebook is also a required tool. The Python tool in Alteryx opens an instance of Jupter Notebook. This is an interface that allows you to code and write comments as you process your data, showing outputs at various steps along your coding process (much like an Alteryx workflow!). While it is intuitive and easy to learn, perhaps if you are unfamiliar with Jupyter Notebook it might be worth running an instance in your browser to trying it out before taking it to a productionized workflow.

Tip 2: Run your workflow first

This tip is the result of a bit of a quirk in how the Python tool runs. As I mentioned, the tool opens an instance of Jupyter Notebook. However, even when you start coding, Jupyter can’t read your inputs like other Alteryx tools can. Instead, you have to run your workflow, which gives the Python tool a profile of its inputs. Then, once you have run the workflow, you can start coding as you normally would in standalone Jupyter Notebook, outputting results and running your code intermittently. Then, once your Python script is done, run your workflow again to output that Python-processed data to your next Alteryx tool. While this step isn’t the be-all-end-all of Python-Alteryx programming, it is a tip that might save you a bit of confusion as to why your Notebook isn’t outputting the way you think it should.

Tip 3: Save your workflow AND your notebook

This tip is another quirk that results from creating a Jupyter Notebook instance in Alteryx. The Python tool runs a Jupyter Notebook file that is saved separately to your Alteryx workflow. That means you must save both your Jupyter Notebook and your Alteryx workflow. If you press Ctrl-S (or Cmd-S) while your Notebook is selected, your notebook will save. This is an important detail because when you run your workflow the instance of Jupyter Notebook will refresh – this means any details you haven’t saved will be lost! So make sure, before you run your workflow, to save both your Alteryx workflow and your Jupyter Notebook.

Tip 4: Use production mode

Perhaps another obvious tip, but this one also helped me when I was starting to integrate Python in Alteryx. One of the downsides of the Python tool is that it runs relatively slowly and it can cause Alteryx to run slower than usual. For this, there is the production mode within the Python tool. You can find it hiding in the top-right corner of your configuration pane.

Running an instance of Jupyter Notebook in Alteryx can put strain on your machine. Running the tool in production mode will switch the python from an instance of Jupyter Notebook to a plain Python script. So, when you have finished editing in Notebook, switch from Interactive to Production mode – this will speed up your workflow just a little bit.

Tip 5: Use macros

While you and I love to use code to build our workflows, not everyone has this same enthusiasm. One of Alteryx’s strengths is its ability to simplify a production process through containers and tools. So sometimes a big chunk of Python code is not the clearest way to structure your workflow. In these instances, Alteryx macros can be your best friend. Putting a Python tool inside a macro can allow you to explain the jobs the macro is doing without delving into the code. Just establish the necessary parameters in the macro interface so the user has control over what that macro is doing. In this way code-illiterate users can still run and understand your workflow without an in-depth knowledge of your Python code.

The Data School
Author: The Data School