SQL is the most commonly used language for managing data in relational databases while it’s importance is being is eclipsed by more user friendly tools like Tableau and Alteryx, it’s still an important tool for those that work in data to understand.

If you are new to SQL, or have never heard of it before, my colleague Parul Kapoor as written this excellent introduction to it on The Data School Blog which can be found here.

If you are already an SQL user, the following is non-exhaustive list of some key clauses and commands that can be printed off and used as an easy reference guide when working with databases. It covers order of operations, basic queries, aliasing, filtering, sorting, aggregation, and joins.

 

1. SQL ORDER OF OPERATIONS

 

To be understood, SQL queries must be written in the following order – however, not all clauses need to be entered into a single query.

SELECT
FROM
JOIN
ON
WHERE
GROUP BY
HAVING
ORDER BY

 

2. BASIC DATA EXTRACTION

 

Extract all data from table

SELECT *
FROM [table_a]

This query returns every column and every row in the table called table_a.

 

Extract top 100 rows from table

SELECT TOP (100) *
FROM [table_a]

This query returns every column and the first 100 rows from table_a

 

Extract specific fields from table

SELECT
[Field 1],
[Field 2],
[Field 3]
FROM [table_a]

This query returns every row from Field 1, Field 2, and Field 3 from table_a

 

Count number of records in table

SELECT COUNT(*)
FROM [table_a]

This query returns count of rows in table.

 

Count distinct number of records in Field

SELECT
DISTINCT [Field 1]
FROM [table_a]

This query returns a list of distinct values from Field 1.

 

3. ALIASES

 

Assign new field name

SELECT
[Field 1] AS Alias1
FROM [table_a]

This query will return all rows from [Field 1], but the field name will be changed to [Alias1].

 

4. ROW LEVEL FILTERING

 

Filter rows by value

SELECT *
FROM [table_a]
WHERE [Field 1] = ‘Value1’

This query returns every column from table_a where the value in Field 1 is equal to the ‘Value1’

 

Other Filtering options

= Equal to
!= Not equal to
< / <= Less than / Less than or equal to
> / >= Greater than / Greater than or equal to
BETWEEN 10 AND 20 Number is within range of 10 to 20 (inclusive)
LIKE String matches wildcard* expression
IN (‘Value1’, ‘Value2’, ‘Value3’) Value matches one of list contained in brackets

 

Wildcard Operators

% Zero or more characters
_ Single character
[abcd] Character in position is a, b, c or d
[^abcd] Character in position is NOT a, b, c or d

 

Filtering on multiple conditions

WHERE [clause 1] AND [clause 2] Row values must match in both expressions
WHERE [clause 1] OR  [clause 2] Row values must match in one or both expressions

 

5. SORTING

 

Sort table in ascending order 

SELECT *
FROM [table_a]
ORDER BY [Field 1] ASC

This query returns every column and row from table_a ordered by Field 1. If Field 1 is a string it will order from A-Z, if it is a numerical field it will order from lowest to highest.

 

Sort table in descending order

SELECT *
FROM [table_a]
ORDER BY [Field 1] DESC

This query returns every column and row from table_a ordered by Field 1. If it is a string field it will order from Z-A, if it is a numerical field it will order from highest to lowest.

 

6. SUMMARY STATISTICS

 

Aggregate values within table

SELECT
[Field 1]
SUM([Field 2]) AS Total Sales*
FROM [table_a]
GROUP BY [Field 1]

Where Field 1 is a dimension (ie. Product name) and Field 2 is a numerical field (ie. product price). Note that Field 1 must be present both in the SELECT and GROUP BY clauses.

This query will return a list of the dimensions and the sum of the associated measure (ie. a list of products and their total sales).

*See 2. Aliases for explanation of the AS clause.

 

Other Summary Statistics:

COUNT([Field 1]) A count of records in Field 1
COUNT(DISTINCT [Field 1]) A count of distinct records in Field 1
SUM([Field 1]) The sum of all records in Field 1
AVG([Field 1]) The mean of all records in Field 1
MIN([Field 1]) The minimum value in Field 1
MAX([Field 1]) The maximum value in Field 1

 

7. FILTERING SUMMARISED DATA

 

Filter aggregated data

SELECT
[Field 1]
SUM([Field 2]) AS Total Sales
FROM [table_a]
GROUP BY [Field 1]
HAVING SUM([Field 2]) > 500

The HAVING clause works as a filter on the summarized data – whereas the WHERE clause filters row level data.

This query will return a list of the dimensions and the sum of the associated measure, where the sum of the associated measure is greater than 500 (ie. a list of products whose total sales exceed 500).

 

8. JOINS

 

Join two tables together

SELECT *
FROM [table_a]
JOIN [table_b]
ON [table_a].[Field 1] = [table_b].[Field 1]

This query returns all rows from [table_a] and [table_b] where [table_a].[Field 1] matches [table_b].[Field 1].

 

Join two tables together using Aliases

SELECT *
FROM [table_a] AS a
JOIN [table_b] AS b
ON a.[Field 1] = b.[Field 1]

This query all rows from [table_a] and [table_b] where [table_a].[Field 1] matches [table_b].[Field 1]. However an ALIAS has been applied to both tables to make the process of writing the join easier.

 

Kieran Adair
Author: Kieran Adair