Structured Query Language (SQL) is a widely used database manipulation tool. Knowing how to use it is considered a foundational knowledge for data analysts. It is quite intuitive, but the code logic might still confuse new users. Specifically, SQL logical order (the sequence of how the queries are processed by the database engine) is quite different from its lexical order (the arrangement of which we write our queries). For instance, look at the query below:

SELECT [ProductName], AVG([Sales]) AS AverageSales

FROM TableName

GROUP BY [ProductName]

HAVING AVG([Sales]) > 0

ORDER BY AverageSales DESC

Now we have created an alias of the AVG([Sales]) as ‘AverageSales’. We can use that alias in ORDER BY clause, but not in HAVING clause! This is because despite having written the alias first, the query actually processes SELECT after GROUP BY and HAVING, and ORDER BY after SELECT. Hence, HAVING would not recognize the alias because it has not been created yet (technically), whereas ORDER BY does recognize the alias because it is already created beforehand with the SELECT clause!

Understanding the difference between the logical and lexical order may save you some time from debugging, especially on filtering, grouping, and sorting. Let’s look at the standard logical order of operations in SQL!

Logically, it makes sense to fetch the data first before actually filtering or calculating anything. You may also notice some SQL tools attempt to auto-complete the field name when you type them, but only after you write FROM clause.

Join is actually an operator of FROM, and it makes sense to create the full table before working on it.

This works as a high level rows filter by the condition you specify, which can reduce the amount of rows that needs to be processed (and hence less memory usage).

Now this is where it’s starting to get spicy. As the name implies, GROUP BY virtually divides the rows into groups/buckets, which then allows for aggregation by those groups. In plain English, it reads like “For each group, what is the aggregation of their values?” which brings us to the next point…

5. Aggregation
The magic happens here specifically. This is why we cannot have aggregation in the WHERE clause, because WHERE happened 2 steps above! And this is also why we can only filter using aggregation in our next step; HAVING

As mentioned above, we can now start filtering the dataset using the aggregation, because technically all the aggregation already happened above!

Now after handling all the calculations and filtering, the engine finally produces the results. We have access to all the grouping and aggregations, hence creating a new field with aggregation (such as AVG([Profit]) and Count) works here.

Note that you cannot create an alias here then refer to the same alias in the same SELECT query (e.g., SELECT column1 as ‘Alias’, SUM([Alias]) ….) because SELECT runs as a batch, so the alias has not been created yet in this run. In fact, you can only use the alias created from SELECT query in ORDER BY (2 steps below).

On the other hand, aliases created for table names from the FROM and JOIN queries can work here because they occurred at the beginning of the whole process. For instance, FROM Table1 AS ‘Alias2’ can be used in SELECT Alias2.ColumnName.

Now that the engine knows the full rows and columns of interest, it can start dropping duplicates.

And after dropping the duplicates and having the final table, the engine can finally start the ranking.

10. LIMIT or TOP
You can now view only the rows of interest of your final table!

Again, this logical order is different from how we write the queries. Comprehending such differences and keeping in mind the logical order can really make writing queries more efficient and accurate. Hope this helps and happy querying!

The Data School
Author: The Data School