Last week, I achieved a machine learning milestone I have been working towards for the last six months – I came in the top 10 (6th place) of Kaggle’s April Tabular Playground Series competition. This competition pits ML practitioners together on simulated tabular data to produce a machine learning model.
For me, this was the culmination of around six months of learning about machine learning. When I began, I had some background in statistics but knew very little about machine learning itself. This blog will recount the steps I took to learn, some advice for other beginners, and my plans for learning more in the future.
Step 1: Learn the Fundamentals
I use this title with some hesitation. Machine learning is such a vast and ever-evolving field that one person might never even need or learn all the fundamentals. But there are some elements that must be learnt in order to begin practicing.
Even in this digital era, I am certain that a good textbook is the best way to learn technical concepts. I will include a short reading list at the end of this blog of a few of the books I read to learn some of the technical concepts of machine learning.
Learning these fundamentals is the most important step. In the year 2022, AutoML and code-free ML applications are common enough that almost anyone can create a well-performing model for a variety of different situations. However, I believe an understanding of how these models work is essential. Knowing how a model works informs other decisions. It informs how you should engineer your features, what kind of model is applicable to different problems, and, most importantly, tells you how your predictions are made in the first place. An intuitive understand (and probably a bit of maths) is essential for understand what goes on under the hood in your ML algorithms.
Step 2: Practice
There are so many ways to practice implementing machine learning models.
I already mentioned one – Kaggle competitions. These contests pit you against other data scientists to optimize your model performance on a given dataset. However, these are far from the only (and perhaps the best) method to practice ML.
Firstly, I like to download toy datasets and play with various algorithms. Creating a sandpit for myself to play with algorithms and test their results gave me an intuitive understanding of what my models were doing – and thus how to use them. You can read here about one such experiment, where I play with the different decision boundaries of popular ML algorithms.
Secondly, even though packages like XGBoost, PyTorch and Tensorflow exist to power your ML models, I spent a significant amount of time implementing models from scratch to gain an understanding of what they are doing under the hood. This proved I understood the maths behind how they worked, and showed my insights into what exactly is going on when my model predicts a 1 or a 0. Here is a toy neural network I built in Alteryx in order to understand how the maths behind the model works.
I highly recommend these kinds of practicing. Rather than just implementing algorithms that perform well in Kaggle, taking the time to understand how these models work has made a huge difference in the depth of my learning.
Step 3: Keep Learning
Although I have achieved my goal of placing highly in a Kaggle competition, there is so much learning left to do.
I have three next steps for my learning ML journey.
Firstly – keep competing. I have discovered an addiction in Kaggle competitions. Incrementally improving my leaderboard score gamifies the ML process in such a way that I cannot pry myself from my laptop. It is lucky these competitions always offer new techniques to implement and peers and experts to learn from…
Secondly – keep reading. Machine learning is such an exciting and developing field that new state-of-the-art research papers come out every day. I am endeavouring to continue to learn the newest and most interesting models (while picking up the basics that I missed along the way).
Thirdly – solve a real problem. This is perhaps my ultimate goal – to build a model that uses big data and ML to solve a real problem. Whether it be at work or in my personal life, actually implementing and productionising a model to solve a real problem is a task with completely different challenges to a Kaggle competition. With some luck, I hope to be able to make this one of my next steps on this journey.
That’s it for this blog! I hope I’ve provided some insight or learning with my experience. I’ve had so much fun learning ML so far – and I will continue to do so!
Beginner Machine Learning Resources I Recommend
Aurélien Géron – Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow
Eli Stevens, Luca Antiga, Thomas Viehmann – Deep Learning with PyTorch
Christoph Molnar – Interpretable Machine Learning
Jay Alammar – The Illustrated Transformer