SQL is the most commonly used language for managing data in relational databases while it’s importance is being is eclipsed by more user friendly tools like Tableau and Alteryx, it’s still an important tool for those that work in data to understand.
If you are new to SQL, or have never heard of it before, my colleague Parul Kapoor as written this excellent introduction to it on The Data School Blog which can be found here.
If you are already an SQL user, the following is non-exhaustive list of some key clauses and commands that can be printed off and used as an easy reference guide when working with databases. It covers order of operations, basic queries, aliasing, filtering, sorting, aggregation, and joins.
1. SQL ORDER OF OPERATIONS
To be understood, SQL queries must be written in the following order – however, not all clauses need to be entered into a single query.
SELECT
FROM
JOIN
ON
WHERE
GROUP BY
HAVING
ORDER BY
2. BASIC DATA EXTRACTION
Extract all data from table
SELECT *
FROM [table_a]
This query returns every column and every row in the table called table_a.
Extract top 100 rows from table
SELECT TOP (100) *
FROM [table_a]
This query returns every column and the first 100 rows from table_a
Extract specific fields from table
SELECT
[Field 1],
[Field 2],
[Field 3]
FROM [table_a]
This query returns every row from Field 1, Field 2, and Field 3 from table_a
Count number of records in table
SELECT COUNT(*)
FROM [table_a]
This query returns count of rows in table.
Count distinct number of records in Field
SELECT
DISTINCT [Field 1]
FROM [table_a]
This query returns a list of distinct values from Field 1.
3. ALIASES
Assign new field name
SELECT
[Field 1] AS Alias1
FROM [table_a]
This query will return all rows from [Field 1], but the field name will be changed to [Alias1].
4. ROW LEVEL FILTERING
Filter rows by value
SELECT *
FROM [table_a]
WHERE [Field 1] = ‘Value1’
This query returns every column from table_a where the value in Field 1 is equal to the ‘Value1’
Other Filtering options
= | Equal to |
!= | Not equal to |
< / <= | Less than / Less than or equal to |
> / >= | Greater than / Greater than or equal to |
BETWEEN 10 AND 20 | Number is within range of 10 to 20 (inclusive) |
LIKE | String matches wildcard* expression |
IN (‘Value1’, ‘Value2’, ‘Value3’) | Value matches one of list contained in brackets |
Wildcard Operators
% | Zero or more characters |
_ | Single character |
[abcd] | Character in position is a, b, c or d |
[^abcd] | Character in position is NOT a, b, c or d |
Filtering on multiple conditions
WHERE [clause 1] AND [clause 2] | Row values must match in both expressions |
WHERE [clause 1] OR [clause 2] | Row values must match in one or both expressions |
5. SORTING
Sort table in ascending order
SELECT *
FROM [table_a]
ORDER BY [Field 1] ASC
This query returns every column and row from table_a ordered by Field 1. If Field 1 is a string it will order from A-Z, if it is a numerical field it will order from lowest to highest.
Sort table in descending order
SELECT *
FROM [table_a]
ORDER BY [Field 1] DESC
This query returns every column and row from table_a ordered by Field 1. If it is a string field it will order from Z-A, if it is a numerical field it will order from highest to lowest.
6. SUMMARY STATISTICS
Aggregate values within table
SELECT
[Field 1]
SUM([Field 2]) AS Total Sales*
FROM [table_a]
GROUP BY [Field 1]
Where Field 1 is a dimension (ie. Product name) and Field 2 is a numerical field (ie. product price). Note that Field 1 must be present both in the SELECT and GROUP BY clauses.
This query will return a list of the dimensions and the sum of the associated measure (ie. a list of products and their total sales).
*See 2. Aliases for explanation of the AS clause.
Other Summary Statistics:
COUNT([Field 1]) | A count of records in Field 1 |
COUNT(DISTINCT [Field 1]) | A count of distinct records in Field 1 |
SUM([Field 1]) | The sum of all records in Field 1 |
AVG([Field 1]) | The mean of all records in Field 1 |
MIN([Field 1]) | The minimum value in Field 1 |
MAX([Field 1]) | The maximum value in Field 1 |
7. FILTERING SUMMARISED DATA
Filter aggregated data
SELECT
[Field 1]
SUM([Field 2]) AS Total Sales
FROM [table_a]
GROUP BY [Field 1]
HAVING SUM([Field 2]) > 500
The HAVING clause works as a filter on the summarized data – whereas the WHERE clause filters row level data.
This query will return a list of the dimensions and the sum of the associated measure, where the sum of the associated measure is greater than 500 (ie. a list of products whose total sales exceed 500).
8. JOINS
Join two tables together
SELECT *
FROM [table_a]
JOIN [table_b]
ON [table_a].[Field 1] = [table_b].[Field 1]
This query returns all rows from [table_a] and [table_b] where [table_a].[Field 1] matches [table_b].[Field 1].
Join two tables together using Aliases
SELECT *
FROM [table_a] AS a
JOIN [table_b] AS b
ON a.[Field 1] = b.[Field 1]
This query all rows from [table_a] and [table_b] where [table_a].[Field 1] matches [table_b].[Field 1]. However an ALIAS has been applied to both tables to make the process of writing the join easier.