In this blog, we are going to explore the mysteries of the fabled Alteryx AMP engine. I say fabled because on the surface, the Alteryx AMP engine is potentially just a myth. And I say mysteries, because I actually have no idea about what it is or how it works other than the fact that its a setting that you can turn on that may or may not boost your Alteryx performance.
What did I think Alteryx AMP engine did?
If you have ever watched any of the Fast and the Furious movies, you will be intimately familiar with a product called Nos. Whenever these speed racers are racing, in the middle of the race, when they need some extra speed they flip up a button cover, the camera pans to a blue gas tank in the passenger seat full of ‘nitrous oxide’, they press the button and bippity boppity the car goes slightly faster than normal.
In my head, I built it up like it was the Nos of Alteryx. But when I ran it on my laptop hoping for that extra boost, the results were kindof…
The real question is why were the results so underwhelming. As it turns out, its mainly due to the fact that I had no idea what it was meant to do. After reading some of the info released by Alteryx on the AMP engine, it turns out I was just using it wrong.
Alteryx AMP Engine: The Simple Version
Before the AMP engine came out, Alteryx was running on the Original Engine. This OG engine was essentially built around single thread performance, but the AMP engine was built around multi-thread performance. Now what does that mean for you?
Think of a thread as a pipe, and the incoming data as water flowing through that pipe. If you want that water to go from one point to another, you essentially have to funnel it all through that pipe. In general, most laptops/computers used by ordinary people have a total of 4 pipes (Intel i5 and below). More serious water enthusiasts (like data analysts who use their laptops for more than checking emails) would likely have a total of 6-8 pipes (Intel i7). Now the more pipes you have, the more water you can move at a time. And to bring the analogy back home, the more threads on your computer, the more data you can process at a time.
However, even though the multi-thread technology was around, Alteryx itself was never great at using multiple threads. Instead of getting each pipe to process a different stream of water separately, Alteryx was still only using one pipe at a time. Essentially, data could only be processed record-by-record, and records could only be processed sequentially. To further explain, only one record could move between tools at any given time.
So how does AMP work?
AMP, or Alteryx Multi-threaded Processing aimed to make the workflows run faster by allowing multiple threads to work in parallel. From what I can tell, the major improvement in speed comes from breaking down tools into smaller tasks and then allocation of those tasks to various threads that work concurrently. So instead of doing all of one process first before the next, you can do parts of different processes at the same time, which are organised by Alteryx in a more efficient way, and subsequently run faster than usual.
What’s interesting is that whipping out this new technology wasn’t as simple as just bringing out a new software update. A lot of the Tools actually needed to be rebuilt to take advantage of this new architecture. Currently, all of the tools that have been converted/partially converted are in a list here: Tool Use with AMP | Alteryx Help
Theoretically. And I stress the word theoretically, AMP should improve performance if you ever use any of these tools.
Lets get Benchmarking
When you first use it, it can be hard to tell whether AMP is making any impact on your workflow at all. Part of that is because you’re not working with large enough datasets to actually make use of AMP. And the other part is that your computer is probably just not good enough to leverage AMP effectively.
We are going to be doing some testing with my personal PC which has a Ryzen 3700x @ 3.6GHz – 8 core – 16 threads and 16GB of RAM (3200mhz).
After clearly not doing ENOUGH research, we have the results of our first benchmark test.
PREDICTIVE TEST: 11 New_Donor Sample Dataset –
This benchmark essentially runs some predictive investigation tools and eventually creates a predictive model of sorts. Its just a sample workflow with Alteryx which I used because it was less work.
Personal PC No AMP – 158s, 155s, 155s = Average 156s
And WITH AMP we have…. 153s, 154s, and 154s = Average 153.7s
For a grand time saving of… 2.3s
SPATIAL TEST: Joining Multiple Spatial Files with Find Nearest
All this benchmark does is take some spatial files, and try to match them to other spatial files using Find Nearest, but with ridiculously high search parameters to increase the workload.
Personal PC: 764s (12m 44s), 750s (12m 30s), 751s (12m 31s) = Average 755s
Personal PC WITH AMP: 234s (3m 54s), 235s (3m 55s), 217s (3m 37s) = Average 228.7s
Finally! Some positive results! A 69.7% time saving!
One thing you’ll notice is that the progress percentages with AMP are present on all of the tools at the same time, unlike the non-AMP version.
I only really had time to do two tests, a predictive and a spatial benchmark. As it turns out, for the predictive workflow I tested AMP has no impact on workflow speed. Spatial on the other hand, greatly benefitted from the AMP Engine. I’m sure if I had attempted some data prep/blending with the AMP engine it would have been successful too. For now, here is a graph I found while waiting for my benchmarks at AMPlify your Workflows – Alteryx Community (below)
So AMP actually works, but not for everything.
One of the key things to note is that my computer has ALOT of threads (16 to be precise), and most laptops don’t have anywhere near as many. Its entirely likely that you may not see much difference when using your laptop. But for computers with lots of cores and lots of threads (workstation computers), AMP might just be solution to your time problems.