This is Part One of a N-Part Series on SQL and its importance as a Data Analyst

Exec. Summary

In the first blog post of my series on SQL, I discuss my experience learning and using SQL as a data analyst. I first learned SQL while I developed a website for my photography business using WordPress; which runs on a MySQL database. I later used Python and SQL to build a project during lockdown called "checkm9" that analyzed my chess games. In my work as a data analyst at The Data School, I have used SQL to speed up the analysis of large tables by aggregating data in-database. I believe that learning SQL is crucial for data analysts, as it allows us to work with large datasets and perform tasks that may be difficult or impractical with other tools.

Preamble

Next week marks the start of Week 12 for the members of the illustrious Cohort 17 of The Data School Down Under. Just yesterday we presented the outcome of our project week after what has been our most challenging week to date; a collection of machine learning models for a well known B2B manufacturer. I can tell you on behalf of all of us in Cohort 17 that we are thrilled to not have a project week next week.

What we do have though is our first SQL training sessions to round out the year. I’m particularly excited about it, in fact I’ve been looking forward to it since we began. But I have a little secret—although it’s very well known within my Cohort—SQL was one of the first programming languages I’d ever learned, and I like using it as much as I can.

Learning SQL to Solve Business Problems

When I ran Prop & Pose, my photography and photo booth business, I tried to do as much as I could to grow the business, and the most important part was our website. When Prop & Pose moved to WordPress I learned a great deal about how websites worked and this took me down a rabbit hole I still follow to this day.

WordPress is simply a PHP application running on a webserver interacting with a MySQL database. If you get under the hood and connect to that database you’ll see all of the posts and pages of your website in its tables.

Image.png
phpMyAdmin is the Online Database Manager for My Website

The image above shows you that database behind my personal blog danlsn.com.au, which is also run on WordPress. The table wp_posts contains all the posts, images and pages from the website, and the table wp_options unsurprisingly contains all the options like the name, description, and URL of the website.

From what I can recall I vaguely new what SQL was at the time and although I had a graphical interface to the database I wanted to learn how it worked and how I could use it to make my website better. So, I took an online course (Codecademy IIRC) and learned the basics of SELECT, FROM, and WHERE.

Taking SQL Further in Lockdown

Fast forward to my 2021 when my desire to become a data analyst started to crystallise and suddenly I wanted to know more than the basics. I wanted to learn how to setup databases, make more complicated queries, and how to store and analyse the data that I had lying around.

I built a silly little project called ‘checkm9' which you can check out on GitHub (https://github.com/danlsn/checkm9), that downloaded my chess games as JSON, transformed them using Python, and loaded them into a PostgreSQL database for me to analyse using an open-source visualisation software called Metabase.

Finally, now that I’m a fortunate member of The Data School I get to put these skills to work, and happily I’ve had a couple of opportunities to flex my SQL knowledge.

Round-One: ~74 Million Rows in SAP HANA

The first time was working with one of Victoria’s largest electricity distributors using their data warehouse in SAP HANA.

The table we were working with had ~74 million rows and the analysis we wanted to do required running multi-row aggregations on the entire dataset; unfortunately neither Alteryx or Tableau could do this at the speed we required (although with a bit more knowledge in Alteryx’ In-DB tools maybe we could have).

The idea I had was to write a collection of SQL scripts that did the aggregations in-database negating the need to download multi-gigabyte tables, instead just returning the rows we needed for visualisation. This cut a task that literally took 3-hours down to mere seconds and saved our project.

Round-Two: An Oracle Database for a Leading Australian University

The second time was for another project week for a leading Australian University although what I provided was a proof of concept to use SQL in Alteryx or Tableau natively instead of the very manual CSV-based process they were using.

This POC involved creating a mock Oracle database from the data we’d been given to mimic the one they used in production for the project we were working on. I won’t bore you with the technical stuff (although maybe I will soon) but I used a Docker container running an Oracle Database instance that I loaded sample data to to show how it could work.

Closing Remarks

In summary, I’m excited to be learning SQL (again) this week because there’s always more to learn. I’m not sure it will be covered in training this time around but I’d really like to be familiar with TRIGGERS, VIEWS, and PROCEDURES in SQL, and get more experience creating databases.

I’m a Data Analytics Consultant today, but as I progress through The Data School I’d relish the opportunity to become a Data Engineer, and knowing the ins and outs of a database will be crucial—I’m so excited.

Until next time.

Love,

Dan Lawson

This blog post is the first in a series I'll be producing about SQL and Data Analytics. Stay tuned for more coming right here to The Data School Blog.

Daniel Lawson
Author: Daniel Lawson

Right off the bat I can tell you that I’m not your average data analyst. I’ve spent most of my career running my own business as a photographer and videographer, with a sprinkling of Web Development and SEO work as well. My approach to life and work is very T-shaped, in that I have a small set of specific skills complemented by a very broad range of interests; I like to think of myself as a dedicated non-specialist. Data Analytics, and Programming, started as a hobby that quickly grew into a passion. The more I learned the more I looked for opportunities to pull, manipulate, and join data from disparate sources in my life. I learned to interact with REST APIs for services I used, personal data from services I use like Spotify, and health data captured by my devices. I learned SQL to create and query databases, as well as analyse SQLite files containing my iMessages and Photos data on my Mac. Every technique I learned opened up more possibilities; now I’m hooked and there’s no turning back. Learn More About Me: https://danlsn.com.au