This is the first post in an N-Post series on Advanced SQL with a focus on Window Functions. Stick around to level up your SQL skills.

Intro to Window Functions in SQL

Window functions are a type of SQL function that allow you to perform calculations over a set of rows. You might be familiar with aggregate functions like SUM(), AVG(), and COUNT() and although they’re similar these operate on all rows to return a single value; however, window functions operate over a “window” of rows and return a value for each row in the result set.

Window functions are useful in situations where it’s necessary to perform calculations on subsets of data within a larger set. For example:

  • Calculating running totals over a period of time; such as year-to-date sales for a particular region.
  • Ranking rows by a particular column; this can be used to identify the top performing products by profit in a region.
  • Calculating Moving Averages; useful for smoothing out fluctuations in sales figures over a period of time.

One key advantage of window functions is that they allow you to perform calculations without needing to specify a new grouping for each calculation. This has the added benefit of simplifying your SQL code and can improve the performance of your queries.

Basic Syntax

Here’s a simple example of the ROW_NUMBER() window function at work; a simple query to return the rank for top 10 songs overall, as well as the rank of that track for its artist:

-- Example of ROW_NUMBER window function
SELECT
    ROW_NUMBER() OVER (ORDER BY count DESC) AS rank,
    ROW_NUMBER() OVER (PARTITION BY artist_name ORDER BY count DESC) AS artist_rank,
    artist_name,
    track_name,
    count
FROM top100
LIMIT 10;

Notice the OVER, ORDER BY, and PARTITION BY keywords in the above query:

  • OVER indicates a window is being defined
  • PARTITION BY divides the rows into groups by artist_name
  • ORDER BY defines the order in which the rows will be processed. In this case count descending; ie number of times played.

Here’s the output:

rank artist_rank artist_name track_name count
1 1 Felix Jaehn Shine 381
2 1 Aeroplane Let’s Get Slow 316
3 1 Couzare Afraid of Heights – Original Mix 287
4 2 Felix Jaehn I Do 280
5 1 Campsite Dream Crush 268
6 1 Krono Run 255
7 1 SAFIA Embracing Me 250
8 2 Couzare Just 249
9 1 Flight Facilities Foreign Language – Flight Facilities Extended Mix 246
10 1 TEEMID Crazy \(feat. Joie Tan\) – Radio Mix 239

As you’d expect the rank column is simply 1-10 in order, however we can see that row 4 and 8 have an artist_rank value of 2 because the artist also has tracks in the top 3.

Examples of Common Window Functions

There are a handful of useful window functions that you can use in your analysis; here are a few of them:

  • RANK(), DENSE_RANK(), and PERCENT_RANK()
    Returns the RANK for each value
  • ROW_NUMBER()
    Sequential number for rows
  • LAG() & LEAD()
    Access a value for rows before or after
  • FIRST_VALUE() & LAST_VALUE()
    Get the first or last value for a specified window and order
  • NTILE()
    Allows users to bin rows into a defined number of bins

We will look into these examples in upcoming posts in this series so stay tuned to learn more about them.

The Data

The data we will use for this series is taken from my personal Spotify history up to 2021 looking at the Top 100 tracks by play count. We will be using PostgreSQL for the purposes of this series.

Conclusion

We have just scratched the surface of what window functions can do in SQL but hopefully I’ve given you a taste. I highly encourage checking back here for more in-depth tutorials examining some of my favourites. I also encourage you to do some of your own research on the web because there are a lot of great tutorials out there.

Until next time!

Love,
Dan Lawson

Daniel Lawson
Author: Daniel Lawson

Right off the bat I can tell you that I’m not your average data analyst. I’ve spent most of my career running my own business as a photographer and videographer, with a sprinkling of Web Development and SEO work as well. My approach to life and work is very T-shaped, in that I have a small set of specific skills complemented by a very broad range of interests; I like to think of myself as a dedicated non-specialist. Data Analytics, and Programming, started as a hobby that quickly grew into a passion. The more I learned the more I looked for opportunities to pull, manipulate, and join data from disparate sources in my life. I learned to interact with REST APIs for services I used, personal data from services I use like Spotify, and health data captured by my devices. I learned SQL to create and query databases, as well as analyse SQLite files containing my iMessages and Photos data on my Mac. Every technique I learned opened up more possibilities; now I’m hooked and there’s no turning back. Learn More About Me: https://danlsn.com.au