Introduction

A short form of a regular expression. A regular expression is a string of text that allows you to create patterns that help you match, align, and manage your text. When you first try to understand a regular expression, it looks like another language. However, if you’re working with text or need to analyse large amounts of data, learning regular expressions can save you thousands of hours. Below are the most frequently used regex operators:

Character classes
.      any character except newline
\w\d\s word, digit, whitespace
\W\D\S not word, digit, whitespace
[abc]  any of a, b, or c
[^abc] not a, b, or c
[a-g]  character between a & g

Anchors
^abc$  start / end of the string
\b\B   word, not-word boundary

Escaped characters
\.\*\\   escaped special characters
\t\n\r   tab, linefeed, carriage return

Groups & Lookaround
(abc)   capture group
\1      backreference to group #1
(?:abc) non-capturing group
(?=abc) positive lookahead
(?!abc) negative lookahead

Quantifiers & Alternation
a*a+a?    0 or more, 1 or more, 0 or 1
a{5}a{2,} exactly five, two or more
a{1,3}    between one & three
a+?a{2,}? match as few as possible
ab|cd     match ab or cd

Use Cases

Instead of explaining everything, I would like to use a few most common use cases to help you grab the basic concept of regex.


Example 1: matching an email in this sentence thedataschool@downunder.com, not this email thedataschool!!!!@downunder. com

Exp: /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
Result: thedataschool@downunder.com

Explain:
^ Start from the beginning of the line
([a-z0-9_\.-]+)@ Match letter(a-z), number(0-9), underscore(_), dot(.) and hyphen(-) one or more(+) to the letter @
([\da-z\.-]+)\.  Match number(\d), letter(a-z), dot(.) and hyphen(-) one or more(+) to the dot(.)
([a-z\.]{2,6})$  Match letter(a-z), dot(.) 2 to 6 times ({2,6}) at the end of the line ($)

Example 1: capture the month, day and year from the list below:

1-04-2020 
01/4/2020 
1/04 2020
Exp: /^0?(\d{1,2})[-\/\s](\d{1,2})[-\/\s](\d{4})$/
Result:  1    04    2020
        01     4    2020
         1    04    2020

Explain:
^0?       Start from the beginning of the line and catch the day with these format (d, dd or 0d)
(\d{1,2}) Capture 1 or 2 digits
[-\/\s]0? Capture -,/ or space
(\d{1,2}) Capture 1 or 2 digits
[-\/\s]   Capture -,/ or space
(\d{4})$  Capture the Year

Conclusion

Regex itself is a very broad topic. You can’t expect yourself to remember everything, have a cheat sheet handy is very useful. Keep in mind that everything has to be built from the ground up so is regex. You can’t force yourself to be able to write a long regex and it can work right out of the box. It is a process of trials and errors. No one can write a long and working expression on their first attempt, I can guarantee you that, even an expert.

So remember, Google is your friend. If you happened to know a few most common use cases, it will tremendously help you to improve your workflow without you even realise.

Below are a few websites that can help you to improve Regex:

Graphical visualisation for your regular expression – https://www.debuggex.com/

Sandbox for you to test your regex – https://regexr.com/

Help you to build your regex using graphical interface – https://regex-generator.olafneumann.org/