What is Computer Vision?
Computer Vision is a field of AI (Artificial Intelligence) which trains computers to interpret and understand the visual world using training data in the form of images.
In this blog post, I will be training two Convolutional Neural Networks (CNN) (one using Alteryx’s Computer Vision tools and one using Python) on a well known image classification dataset MNIST. I will also be comparing their performance as well as discussing pros and cons of each approach.
The MNIST data set contains handwritten digits (0-9) in white, on a black background. The data set is commonly used for image classification and is used as the benchmark for classification algorithms. The data set contains 60,000 training images and 10,000 testing images.
A sample of the handwritten digits from the MNIST data set.
The MNIST data set in JPG format can be found here. The file structure for the train/test images is shown below:
The label or target for each image is given by the folder it is located in.
Alteryx Computer Vision Workflow
The MNIST dataset contains grayscale images while the Image Recognition tool in Alteryx requires images to be in RGB. To convert the 70,000+ images to RGB, I used the tool “Imagemagick” to convert them in batches. For those interested, these are the commands I used to convert the images (using macOS Terminal):
// Change directory to folder containing grayscale
cd [PATH OF FOLDER CONTAINING GRAYSCALE IMAGES]
// Convert images from this directory and place converted images to new folder
mogrify -path [PATH OF FOLDER TO EXPORT TO] -format jpg -type TrueColor
This was repeated for each folder (0-9) in both Training and Testing folders.
Once these images were in the proper format, I used Alteryx to create the following Image Recognition workflow:
Alteryx Computer Vision Workflow
- The first step involves importing the directories of the training images using the directory tool. Remember to check “Include Subdirectories”
- Next, select the first 1,000 images from each directory which results in 10,000 training images (10 from each class). Remember to Group By “Directory”
- Then the Image Input tool is used to read in the images from the directory.
- The Text to Columns tool is used to extract the label or target for each image from the directory.
- The images are then resized using the image processing tool.
- Steps 1-5 are repeated for the Testing/Validation data
- Then, use the Image Recognition tool and input the Training and Validation images and labels. These images are used to train the model. The hyper-parameters used for this model are 10 Epochs, The ResNet50V2 Pre-Trained Model and 32 for the batch size.
- Finally, export the model as an Alteryx Database object (.yxdb) to use the model for future predictions.
Python CNN in Google Colab
Google Colab is a great place to write and execute Python code in the cloud. It has the added benefit of being able to access GPUs and TPUs in the cloud which helps to accelerate machine learning tasks. This Colab notebook contains the Python code used to create a CNN to classify the MNIST data.
To create a CNN in Python, we first need to import a few libraries including Keras and Tensorflow which are popular libraries used for machine learning. Then, we can import the MNIST data directly from Keras and select a random subset of 5,000 training images and 1,000 testing and validation images.
Next, set the batch size, number of classes and epochs. The data is then prepared by converting the data into a float and normalise the values so the pixel values range from 0-1 instead of 0-255. The labels are then one hot encoded using the “to_categorical” function from Keras.
Finally, create the model by adding multiple convolutional layers followed by activation functions. These layers train the weights of the neural network in order to learn patterns. The model is then trained using the training images and validated using the validation images.
Alteryx: After 10 epochs of Resnet50V2, the model had a training accuracy of 87.3% and a validation accuracy of 84.1% which is impressive.
Python: After only 5 epochs, the model had an even more impressive training accuracy of 98.6% and a validation accuracy of 96.6%.
A very small test set (10 images 0-9) was created by me to test the models to see how they perform with new unseen images. The results are shown below:
The Alteryx model predicted with a 90% accuracy. It had misclassified the handwritten 5 as a 2. The Python model predicted with 100% accuracy. However, this was a very small sample size with neat test images. These accuracies are very close to the validation accuracy so the models don’t seem to be overfitting.
Overall, Alteryx’s Computer Vision tools are very limited. Alteryx only provides a small amount of options (Epochs, Pre-Trained Model, Batch Size). There are a lot of missing options that are provided in Python. These options include adjusting the learning rate, including dropout, the use of a GPU (Including free but limited use of a GPU in Google Colab) as well as image augmentation to name a few. However, Alteryx has an advantage since the knowledge of coding is not required and it is a lot quicker to get started. I would only recommend Alteryx’s Computer Vision tools for a specify use case for someone not proficient in Python. For those who do know Python, I would recommend using Python for Computer Vision.