Hamburger Icon
  • Labs icon Lab
  • Core Tech
Labs

Guided: Regex Foundations and Essentials

Regex is a great tool to use for extracting information of interest from datasets. In this lab you will learn how to use the concept of regular expressions (Regex) and get hands on experience writing regex expressions in Python for the purpose of extracting information that you need from text and log files.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 1h 7m
Published
Clock icon Feb 21, 2024

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction to Regex Syntax

    In this code lab you will learn the foundations of Regex and how to implement it in the python programming language.

    Regex is short for regular expressions and it's a means of creating a pattern that can be used to match a specific type of data. For example most phone numbers follow a pattern of 3 digits-3 more digits-4 digits. Using regex we can take any piece of information with a distinct pattern and create code that can parse data and collect anything matching that pattern.

    To begin you will cover some of the most important regex syntax that you need to understand for this lab:

    Common Matching Patterns

    \d : This character matches any digit [0-9]

    \w : Matches any word character [a-z or A-Z]

    \s : Any white space character

    Common Quantitative Patterns

    '+' : Matches one or more of a pattern. E.g \d+ means one or more digits

    '?' : Means zero or one of a pattern

    '*' : Means zero or more of a pattern

    Escape Characters

    On some occasions you may need to include a specific character such as an '@' for email addresses. Sometimes you can just type that character in the middle of your expressions, other times you may need to escape it. Escaping a character simply means telling the computer to treat this character as a direct match. You can do this with a backslash. For example if you create a regex that looks like \d+\@, then it will match any number of digits followed by a '@' symbol.

    Now that you understand the basic syntax, you are ready to move on to the first exercise. Just to make it easier for you, as you are working on your regex you may want to practice using a website like https://regex101.com/ so that you can see if you're search queries work before putting it into your script.

    When going through each exercise you will be working with files found in a different sub directory in the following order:

    1. First_Scripts
    2. Input__Validation
    3. Log_Search
    4. Filtered_Log_Data
    5. Final Project

    In the event that you get stuck on any particular challenge there is a folder called solutions that includes the completed version of each script, good luck!

  2. Challenge

    Building your first Regex Scripts

    For your first challenge you will complete the script called Email_Address.py in order to find email addresses in the info.txt file.

    If you look at the script it begins by opening the file and saving it's contents in a variable called contents. Next in results it is attempting to perform a search on contents to find the email address. In the empty quotation marks of the result variable you need to construct a regex expression to detect an email address.

    Using the syntax provided in the first section you will need to create a search that accounts for an unlimited number of word characters, followed by an @, then more word characters, followed by a '.' and then more word characters.

    Once this is constructed properly, run the script by going to the terminal, running cd First_Scripts and then python Email_Address.py Now that you have completed the email address script let's try a different pattern. In the IP_Address_Finder.py you will see a similar script to detect IP addresses. Again, complete the script to detect the IP Address in info.txt, given that the pattern for an IP address are numbers separated by three '.'. For the final part of this challenge, review the Address_Finder.py file and see if you can understand the regex pattern used to identify the physical address in the info.txt file. It uses a new regex character \s that you were not introduced to within the introduction so you wil need to do some. research to understand what it's doing.

  3. Challenge

    Using Regex to perform input Validation

    In the previous section you used re.search to perform a one time match of a dataset to return a match, however in this challenge you will be using the re module for a different application. You will review two different scripts that use regex to perform password validation and ensure that the password supplied using user input is in fact secure. Secure meaning:

    • 8-12 characters in length
    • 1 uppercase letter
    • 1 lowercase letter

    In this step you will get a direct comparison of how regex can be used to perform an action much faster than manual searches by comparing the regex script 'one_step_validation.py' to the 'iterated_validation.py'. Both scripts are located in the Input_Validation Directory.

  4. Challenge

    Parsing Log Data

    In this step you are going to perform a search on a set of log data to extract information of interest. In this step you will learn how to search for specific keywords in log files using a function calledsearch. You will be looking to pull information related to failed login attempts. To begin navigate to the Log_Search Directory. In the next section of this step you will use the Joint_Log_Search.py script to search for log entries based on two different keywords. This allows you to create more tailored searches that return information that is more of a direct match to what you're looking for.

  5. Challenge

    Filtering Log Data

    In this step you will see a technique that can be used to filter log data more effectively when performing your regex searches. Inside the Filtered_Log_Data Directory the Filter_Data.py script has been modified to use the line.split() function to isolate specific parts of the log file and return only the fields that are desired by you.

  6. Challenge

    Final Project

    For this final project you are going to create a script that will search the given log file and return a list of all found IP Addresses, Email_Addresses and phone numbers. It will primarily use the re.findall module to perform the regex search but you will be required to write the regex syntax for each one! This concludes your introduction to regex foundations and essentials. From this course you should have established:

    1. An understanding of basic regex syntax to create matches
    2. How to isolate specific parts of regex matches to return desired results
    3. Where you can go to test your regex expressions (Regex101)
    4. How to use python to open, read and scan files for specific information.

Shimon Brathwaite is a seven-year cybersecurity professional with extensive experience in Incident Response, Vulnerability Management, Identity and Access Management and Consulting.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.