In our fourth week of training, we learned about Regular Expressions (REGEX). At first, I was confused because I had no idea what it was or why we needed to learn it. After a whole afternoon of practice, I found that it was quite useful. So now, let’s take a look at how Regular Expressions can make your life easier.

A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. RegEx can be used to verify if a string contains the desired pattern.

For example, during web scraping, you may need to extract 1,000 email addresses. In this case, RegEx can help you accomplish this task. To begin, you would identify the pattern of an email address and use RegEx to search for that pattern in the data you want to scrape.

The following emails have the pattern of ‘letters/numbers followed by ‘@’ symbol, and then letters followed by ‘.’ and then the ‘.com’ domain:

hiweay@gmail.com

5237992@qq.com

Jayon@icloud.com

With RegEx, you can search for this pattern in the data and extract the desired information, so you can use the following RegEx expression to match the pattern of an email address:

^\w+([-+.]\w+)@\w+([-.]\w+).\w+([-.]\w+)*.$

Please note that this expression is just one example of how to match an email address pattern and may not cover all possible variations of email addresses.

The Data School
Author: The Data School