In this blog I will go over Alteryx Weekly Challenge 200, solve a sudoku puzzle from the Wikipedia article. That I also expanded to solving a 1000 different sudoku grids using a Kaggle data set. This type of problem is not one that would typically be solved using Alteryx, it would be easier to use a programming language and apply recursive techniques. Saying that though applying these same principles into Alteryx pushes one’s skills and may allow you to view a problem in a different way in the future.
In this blog I will go over the following:
- Computer Science Background
- Preparing the data
- Applying the algorithm
- Putting the workflow to the test
Computer Science Background
For those that do not have any background in programming or computer science I will go over some of the concepts that I used in completing this challenge. I will preface this with that I have just taught myself these concepts and do not have any formal training in this area and will do my best to explain it correctly.
So, I will take a step back, an algorithm is a set of rules/steps that a program applies to solve a problem. In this challenge I tried to adapt a backtracking algorithm to solve the sudoku grid. This is a type of brute force method; it will go to the first empty cell in the grid and pick a value for it. If it is valid then it will move onto the next and repeat the first step until it finds a cell that can take no possible value. At which it will backtrack to previous cells and try different values. It will continue to do this until it solves the grid (if it’s a valid sudoku grid). It’s a simpler algorithm to apply but its draw back is that it is not the most efficient and grids can be designed to hard for it to solve.
The next thing concept that I have used in solving this is the idea of the stack. In computer science this is a collection objects that could be thought of as a stack of books, you put new books on top of the stack and if you want to retrieve one you must take what is on the top first.
Preparing data into a usable format
The input for this challenge was an 81-digit number that represented all the numbers for the grid where zeroes were where blank cells would be. The first part of the challenge was to get this into a more readable grid that represents the sudoku grid. I took this further because I wanted the data to do be tall with grid references and the box that it fell into. That is I wanted the data to be resented in the follow way but 1 row of data per grid cell.
This will make it easier for the macro to compare a number to any that fall in the same row, column or box. I have also added an attempt column which represents the stack or which grid should be checked next in this case because there is only this initial grid, I have set it to 1.
Applying the Backtracking algorithm
In my workflow I am treating all rows from a single sudoku grid as a single object. That is each 81 rows of data that has a value for each cell in the grid I will talk of as a single object and in the workflow, this would be denoted with the attempt column. The basic flow of macro is as follows.
So to break this down into the steps:
- Take the next grid (81 rows of data) in the stack
- Then take the next cell in the grid that is missing a value
- For this cell I will generate 9 new grids representing all possible combinations (ie 1-9)
- Test each of these new grids to see if they meet all rules (ie compare the row, columns, boxes)
- Any that pass will be added to the top of the stack (Could be 0-9 new grids added)
- Stack is then sent back to iterative input to start again
The Alteryx Macro
Get next time in stack
This is simply just taking the max of attempt to get the next grid in the stack
Try new values for cells
This is where the bulk of the work is done. First I get all the cells in the grid that have a value of 0 resenting blank spots and take the first one in the list. I then use generate rows to make a new row for each possible value (1-9). Each row is checked against the corresponding row, column, box to make sure that it does not violate any of the rules (this is through the three joins, only letting row that don’t join at every stage through)
Output Stack to next iteration
Then for each new row that passed the tests create a grid with this new value inside the cell, incrementing the step number so that it will be larger than all previous. I then union this back to the current stack of grids and this gets sent to the iterative output
Test to see if solved
So you might wonder how does the macro know once its has been solved. When it checks for missing values, if it does not find any 0s in the current grid it will force the iterative output to have 0 records going through it and then allow the current grid to go to the output of the macro
Putting it to the test
So with all my parsing and solving macro I input the wiki puzzle, waited 30s and got the result. Great but I wondered with it being a brute force method did I just get lucky with the puzzle that was used and it happened to be easy for my macro to solve. I went off and found a data set that contained 1 million sudoku grids and solutions (that happened to be in exact format I needed) and ran it through increasing number of grids.
I first ran it through 10, all correct. Next through 100 grids, 1 failed because it exceeded the number of allowed iterations, but all other successful in about 2minutes. I decided to ramp it up and go for 1000. All succeded in 22 minutes. This result matches the literature for brute force method where the complexity doesn’t make much difference to how long it takes to solve.
If you would like to have a look at my workflows find the packaged workflow attached