LAG() & LEAD() | Advanced SQL Functions

This is the second post in an N-Post series on Advanced SQL with a focus on Window Functions. You can find the first post here Intro to Window Functions in SQL. Stick around to level up your SQL skills.

Preamble

LAG() and LEAD() are two of the most useful window functions in SQL. They allow you to access values from rowsaboveorbelowthe current row. This is quite useful for queries where you want to compare values or make calculations based on past or future values.

For example, using LAG() you can calculate the change in a monthly sales figure from the previous month. However, youre not limited to just the previous row. You can use LAG() to access any row in the window partition. You could choose to access the row from 6 months ago, or the row from 2 years ago. The choice is yours.

Data Used in this Post

For this post, well be using a dataset containing my personal Spotify listening history stored in a SQLite database. SQLite is a great database to use for small projects because its very easy to set up, and doesnt require a server to run. You can download the SQLite database file here: Download spotify.db from my GitHub Profile.

There are syntax differences between most databases, but the concepts are the same. LAG() and LEAD() are fairly consistent across databases, so you should be able to follow along with any database.

The database contains multiple tables, however, for the purposes of this tutorial we will focus on the endsong table which contains my streaming history. See the schema below:

create table endsong
(
"index" BIGINT,
ts TIMESTAMP,
ms_played BIGINT,
master_metadata_track_name TEXT,
master_metadata_album_artist_name TEXT,
master_metadata_album_album_name TEXT,
spotify_track_uri TEXT,
episode_name TEXT,
episode_show_name TEXT,
spotify_episode_uri TEXT,
reason_start TEXT,
reason_end TEXT,
shuffle BOOLEAN,
skipped FLOAT,
offline BOOLEAN,
offline_timestamp DATETIME,
incognito_mode BOOLEAN,
filename TEXT
);

LAG()

Lets look at a basic example of LAG() first. Looking at my five most recent streams, I want to know the timestamp (ts) of the previous stream of that track. I can do this using the following query:

SELECT ts,
master_metadata_track_name as track,
master_metadata_album_artist_name as artist,
LAG(ts, 1) OVER (
PARTITION BY
master_metadata_album_artist_name,
master_metadata_track_name
ORDER BY
ts ASC
) AS previous_ts
FROM endsong
ORDER BY ts DESC
LIMIT 5

Lets unpack what just happened. The LAG() function takes two arguments, which is the column you want to access from the previous row, and the offset value. In this case, we want to access the timestamp of the previous row. We can do this by passing the ts column and an offset value of 1 to the LAG() function.

The OVER() clause is where we define the window partition. In this case, we want to partition the data by the artist name. This means that the LAG() function will access the previous row for each artist and track. We then order the data by the timestamp in descending order. This means that the most recent stream for each artist will be at the top of the window partition.

The result of the query is shown below:

ts track artist previous_ts
2022-02-03 00:27:57.000000 Circles – Monument Remix Monument null
2022-02-03 00:02:14.000000 Some Nights fun. null
2022-02-02 22:26:06.000000 Study Hall Hungry Kids of Hungary 2022-02-02 12:34:42.000000
2022-02-02 22:23:04.000000 I Got U Duke Dumont 2022-02-02 12:21:11.000000
2022-02-02 22:22:15.000000 Stolen Dance Milky Chance 2020-04-30 04:11:31.000000

As you can see above, for the first and second row we actually have a null value for the previous timestamp. This is because there is no previous row, meaning this is the first time Ive listened to that track. For the remaining rows we see a timestamp for my last stream of that particular track.

LEAD()

LEAD() is very similar to LAG(). The only difference is that LEAD() accesses the row below the current row. Lets look at an example of LEAD() in action. I want to know the timestamp of the next stream of each track. I can do this using the following query:


SELECT ts,
master_metadata_track_name as track,
master_metadata_album_artist_name as artist,
LEAD(ts, 1) OVER (
PARTITION BY
master_metadata_album_artist_name,
master_metadata_track_name
ORDER BY
ts ASC
) AS next_ts
FROM endsong
ORDER BY ts ASC
LIMIT 5;

The only difference between this query and the previous one is that we are ordering the data by the timestamp in ascending order. This means that the oldest stream for each artist will be at the top of the window partition.

The result of the query is shown below:

ts track artist next_ts
2014-03-13 08:18:13.000000 Reservoir Dogs (feat. 360, Pez, Seth Sentry & Drapht) Bliss n Eso 2014-03-15 07:32:12.000000
2014-03-13 08:21:03.000000 Magic Coldplay 2014-06-22 05:54:38.000000
2014-03-13 08:22:15.000000 Spectrum – Radio Mix Zedd 2014-03-18 01:20:22.000000
2014-03-13 08:22:18.000000 The Tide (feat. Steffi Nguyen) – TheFatRat Remix TheFatRat 2014-06-23 03:16:20.000000
2014-03-13 08:22:41.000000 Set It Off – TheFatRat Remix Diplo null

As you can see above, for the last row we actually have a null value for the next timestamp. This is because there is no next row, meaning this is the most recent stream of that track. For the remaining rows we see a timestamp for my next stream of that particular track.

This example could be useful if you wanted to find tracks that you havent listened to in a while. You could use the LEAD() function to find the next stream of each track and then filter out the null values.

Conclusion

In this tutorial, we looked at the LAG() and LEAD() window functions. We saw how we can use these functions to access the previous and next row in a window partition. We also saw how we can use these functions to find the timestamp of the previous and next stream of a track.

Window functions are a powerful tool that can be used to solve a variety of problems. I hope this tutorial has given you a good introduction to a couple of the most common window functions. If you want to learn more about window functions, look out for more posts in the future.

Love,
Dan


 

Daniel Lawson
Author: Daniel Lawson

Right off the bat I can tell you that I’m not your average data analyst. I’ve spent most of my career running my own business as a photographer and videographer, with a sprinkling of Web Development and SEO work as well. My approach to life and work is very T-shaped, in that I have a small set of specific skills complemented by a very broad range of interests; I like to think of myself as a dedicated non-specialist. Data Analytics, and Programming, started as a hobby that quickly grew into a passion. The more I learned the more I looked for opportunities to pull, manipulate, and join data from disparate sources in my life. I learned to interact with REST APIs for services I used, personal data from services I use like Spotify, and health data captured by my devices. I learned SQL to create and query databases, as well as analyse SQLite files containing my iMessages and Photos data on my Mac. Every technique I learned opened up more possibilities; now I’m hooked and there’s no turning back. Learn More About Me: https://danlsn.com.au