- Lab
- Core Tech

Guided: Regex Foundations and Essentials
Regex is a great tool to use for extracting information of interest from datasets. In this lab you will learn how to use the concept of regular expressions (Regex) and get hands on experience writing regex expressions in Python for the purpose of extracting information that you need from text and log files.

Path Info
Table of Contents
-
Challenge
Introduction to Regex Syntax
In this code lab you will learn the foundations of Regex and how to implement it in the python programming language.
Regex is short for regular expressions and it's a means of creating a pattern that can be used to match a specific type of data. For example most phone numbers follow a pattern of 3 digits-3 more digits-4 digits. Using regex we can take any piece of information with a distinct pattern and create code that can parse data and collect anything matching that pattern.
To begin you will cover some of the most important regex syntax that you need to understand for this lab:
Common Matching Patterns
\d
: This character matches any digit [0-9]\w
: Matches any word character [a-z or A-Z]\s
: Any white space characterCommon Quantitative Patterns
'+'
: Matches one or more of a pattern. E.g \d+ means one or more digits'?
' : Means zero or one of a pattern'*'
: Means zero or more of a patternEscape Characters
On some occasions you may need to include a specific character such as an '@' for email addresses. Sometimes you can just type that character in the middle of your expressions, other times you may need to escape it. Escaping a character simply means telling the computer to treat this character as a direct match. You can do this with a backslash. For example if you create a regex that looks like
\d+\@
, then it will match any number of digits followed by a '@' symbol.Now that you understand the basic syntax, you are ready to move on to the first exercise. Just to make it easier for you, as you are working on your regex you may want to practice using a website like https://regex101.com/ so that you can see if you're search queries work before putting it into your script.
When going through each exercise you will be working with files found in a different sub directory in the following order:
- First_Scripts
- Input__Validation
- Log_Search
- Filtered_Log_Data
- Final Project
In the event that you get stuck on any particular challenge there is a folder called solutions that includes the completed version of each script, good luck!
-
Challenge
Building your first Regex Scripts
For your first challenge you will complete the script called Email_Address.py in order to find email addresses in the info.txt file.
If you look at the script it begins by opening the file and saving it's contents in a variable called
contents
. Next in results it is attempting to perform a search oncontents
to find the email address. In the empty quotation marks of the result variable you need to construct a regex expression to detect an email address.Using the syntax provided in the first section you will need to create a search that accounts for an unlimited number of word characters, followed by an @, then more word characters, followed by a '.' and then more word characters.
Once this is constructed properly, run the script by going to the terminal, running
cd First_Scripts
and thenpython Email_Address.py
Now that you have completed the email address script let's try a different pattern. In theIP_Address_Finder.py
you will see a similar script to detect IP addresses. Again, complete the script to detect the IP Address in info.txt, given that the pattern for an IP address are numbers separated by three '.'. For the final part of this challenge, review theAddress_Finder.py
file and see if you can understand the regex pattern used to identify the physical address in the info.txt file. It uses a new regex character\s
that you were not introduced to within the introduction so you wil need to do some. research to understand what it's doing. -
Challenge
Using Regex to perform input Validation
In the previous section you used
re.search
to perform a one time match of a dataset to return a match, however in this challenge you will be using the re module for a different application. You will review two different scripts that use regex to perform password validation and ensure that the password supplied using user input is in fact secure. Secure meaning:- 8-12 characters in length
- 1 uppercase letter
- 1 lowercase letter
In this step you will get a direct comparison of how regex can be used to perform an action much faster than manual searches by comparing the regex script
'one_step_validation.py'
to the 'iterated_validation.py
'. Both scripts are located in theInput_Validation
Directory. -
Challenge
Parsing Log Data
In this step you are going to perform a search on a set of log data to extract information of interest. In this step you will learn how to search for specific keywords in log files using a function called
search
. You will be looking to pull information related to failed login attempts. To begin navigate to theLog_Search
Directory. In the next section of this step you will use theJoint_Log_Search.py
script to search for log entries based on two different keywords. This allows you to create more tailored searches that return information that is more of a direct match to what you're looking for. -
Challenge
Filtering Log Data
In this step you will see a technique that can be used to filter log data more effectively when performing your regex searches. Inside the
Filtered_Log_Data Directory
theFilter_Data.py
script has been modified to use theline.split()
function to isolate specific parts of the log file and return only the fields that are desired by you. -
Challenge
Final Project
For this final project you are going to create a script that will search the given log file and return a list of all found
IP Addresses
,Email_Addresses
andphone numbers
. It will primarily use there.findall
module to perform the regex search but you will be required to write the regex syntax for each one! This concludes your introduction to regex foundations and essentials. From this course you should have established:- An understanding of basic regex syntax to create matches
- How to isolate specific parts of regex matches to return desired results
- Where you can go to test your regex expressions (Regex101)
- How to use python to open, read and scan files for specific information.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.