Regular expressions, or regex, is a super handy and powerful tool for matching patterns in text data. It’s language agnostic meaning it can be used across many applications, software programs and programming languages.

In our 3rd week at the data school we had a brief overview into Regex, but it wasn’t until I spent some extra time practicing and revising Regex expressions did it make a bit more sense to me. The best way to improve is to practice and set yourself some tasks to match specific expressions, as there aren’t a whole lot of websites that provide endless exercises.

Here are 10 regex patterns that I’ve found to be helpful to remember, or have on hand:

  1. Capturing Everything: (.*) This pattern captures any character, including whitespace and line breaks, zero or more times. For example, ^Subject: (.*)$ captures the entire subject line of an email.
  2. Matching Any Word Character: \w This pattern matches any word character, including letters, digits, and underscores. For example, \w+ matches one or more word characters.
  3. Matching Non-Word Characters: \W This pattern matches any non-word character. For example, \W+ matches one or more non-word characters.
  4. Matching Any Digit: \d This pattern matches any digit character. For example, \d{3} matches exactly three digit characters.
  5. Matching Any Non-Digit: \D This pattern matches any non-digit character. For example, \D{3} matches exactly three non-digit characters.
  6. Matching a Specific Word: \bword\b This pattern matches a specific word boundary. For example, \bsuccess\b matches the word “success” but not “successful”.
  7. Positive Lookahead: (?=pattern) This pattern matches a pattern only if it is followed by another pattern. For example, \d+(?=%) matches one or more digits only if they are followed by a percent sign.
  8. Negative Lookahead: (?!pattern) This pattern matches a pattern only if it is not followed by another pattern. For example, \d+(?!\.\d) matches one or more digits only if they are not followed by a decimal point.
  9. Matching Repeated Patterns: (pattern){n,m} This pattern matches a pattern that is repeated between n and m times. For example, (\w+\s){2,3} matches two or three words separated by spaces.
  10. Matching URLs: (https?://[^\s]+) This pattern matches a URL, including both http and https protocols. For example, (https?://[^\s]+) matches “https://www.example.com” in a text string.

Again, the best way to improve at Regex is practice, practice and more practice. Eventually you’ll be more comfortable and familar with different ways to problem solve.

A few resources that I found handy are:

  • Regex crosswords – kinda like Sudoku but using regex
  • Regex 101 – A really good regex sandpit. I like how it clearly highlights the matched group and explains any regex command you use.
  • Hardvard Computer Science Lecture on Regex – Really thorough and easy to comprehend. It is part of a Python series but it’s very applicable beyond Python to any and all uses of regex.
Tim Fawcett
Author: Tim Fawcett