• Labs icon Lab
  • Core Tech
Labs

Guided: Using Regular Expressions in Python

Unlock the power of regular expressions in Python with this hands-on lab designed to build your confidence and capability in text manipulation and data extraction. Whether you're parsing log files, validating user input, or cleaning messy datasets, mastering regex can streamline your workflow and enhance your code's precision. This lab walks you through the core syntax, including essential symbols like \d, \w, and \s, as well as anchors and quantifiers for targeted pattern matching. You'll gain practical experience using Python's match(), search(), and findall() functions to locate data, and learn how to validate common formats such as emails and dates. Finally, you'll manipulate text using powerful substitution and splitting techniques—essential tools for real-world tasks like analyzing CSV-style strings or cleaning logs. Perfect for developers, data analysts, and anyone looking to boost their pattern recognition skills, this lab makes regex approachable, applicable, and immediately useful.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 1h 37m
Published
Clock icon Apr 10, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction

    Welcome to the Guided: Using Regular Expressions in Python Lab

    In this lab, you will be provided with an environment and step-by-step instructions to help you:

    • Create proper regular expressions with core regex syntax and patterns
    • Perform text search and extraction with Python's re module
    • Manipulate text through substitution and splitting with Python's re module
    • Apply regex for real world data validation and extraction

    Prerequisites

    You should have a basic understanding of Python, including how to write methods and instantiate variables. No prior experience with regular expressions is required.

    Throughout the lab, you will run Python commands in the Terminal window your task implementations. All commands should be run from the workspace directory and will follow this structure:

    python3 regex_utils.py step<step_number> task<task_number> <text_to_perform_regex_on> <prefix_or_suffix_if_applicable>
    

    Tip: If you need assistance at any point, you can refer to the /solution directory. It contains subdirectories for each of the steps with example implementations.


  2. Challenge

    Text Search and Extraction

    Regular Expression Syntax

    A regular expression (regex or regexp) is a powerful tool for describing and matching patterns in text. It is a sequence of characters that defines a search pattern. It's used for matching, locating, and managing text. Think of it like a super-powered search tool that can describe very complex text patterns.

    You typically use regular expressions in:

    • Searching for specific text within a document or a string.
    • Validating data inputs (like checking if an email address is properly formatted).
    • Replacing parts of text.
    • Splitting text into parts based on patterns.

    This is done following a very specific syntax that contains symbols, anchors, quantifiers, special characters, lookaheads, and lookbehinds.

    Regular Expression Syntax Table Quick Reference
    | Symbol             | Category              | Meaning / Description                                                           |
    

    |--------------------|-----------------------|----------------------------------------------------------------------------------| | . | Wildcard | Matches any single character except newline | | \d | Character class | Matches any digit (same as [0-9]) | | \D | Character class | Matches any non-digit | | \w | Character class | Matches any word character (letters, digits, underscore) | | \W | Character class | Matches any non-word character | | \s | Character class | Matches any whitespace (space, tab, newline) | | \S | Character class | Matches any non-whitespace character | | [...] | Character set | Matches any one character inside brackets | | [^...] | Negated set | Matches any character not inside brackets | | | | Alternation | OR operator; matches either the left or right pattern | | () | Grouping | Groups expressions, enables capturing or combining parts | | (?:...) | Non-capturing group | Groups pattern but doesn't capture it | | (?P<name>...) | Named group | Captures a group with a name | | \b | Anchor | Word boundary (between word and non-word character) | | \B | Anchor | Not a word boundary | | ^ | Anchor | Matches the start of the string (or line with multiline flag) | | $ | Anchor | Matches the end of the string (or line with multiline flag) | | * | Quantifier | Matches 0 or more repetitions | | + | Quantifier | Matches 1 or more repetitions | | ? | Quantifier | Matches 0 or 1 (makes preceding token optional) | | {n} | Quantifier | Matches exactly n repetitions | | {n,} | Quantifier | Matches n or more repetitions | | {n,m} | Quantifier | Matches between n and m repetitions | | ? after quant. | Lazy modifier | Makes quantifier non-greedy (match as little as possible) | | (?=...) | Lookahead (positive) | Match if followed by pattern (doesn't include it in result) | | (?!...) | Lookahead (negative) | Match if not followed by pattern | | (?<=...) | Lookbehind (positive) | Match if preceded by pattern | | (?<!...) | Lookbehind (negative) | Match if not preceded by pattern | | \\ | Escape | Escapes a special character (e.g., \. matches a literal dot) |


    Python's re Module

    Python's re module contains several helpful methods used for completing text search, extraction, and manipulation with the help of regex.

    `re` Module Methods Quick Reference
    | Method         | Purpose                                   | Parameters                                | Returns                 | Notes                                              |
    

    |----------------|-------------------------------------------|-------------------------------------------|--------------------------|----------------------------------------------------| | match() | Match pattern at the start of string | pattern, string, flags=0 | Match object or None | Good for "does this string start with..." | | search() | Search anywhere in string | pattern, string, flags=0 | Match object or None | Finds first occurrence | | fullmatch() | Match the entire string | pattern, string, flags=0 | Match object or None | Use for strict validation | | findall() | Find all non-overlapping matches | pattern, string, flags=0 | List of strings or tuples| Use finditer() for match objects instead | | finditer() | Iterate over all matches as objects | pattern, string, flags=0 | Iterator of Match objects| Useful for position info, grouping, etc. | | sub() | Replace pattern with replacement string | pattern, repl, string, count=0 | New string | Use for find-and-replace | | subn() | Like sub(), but also returns count | pattern, repl, string, count=0 | Tuple: (string, count) | Great for auditing replacements | | split() | Split string by pattern | pattern, string, maxsplit=0 | List of strings | Smarter than str.split() | | compile() | Compile pattern for reuse | pattern, flags=0 | Compiled pattern object | Improves performance with repeated use | | escape() | Escape special regex chars in input | string | Escaped string | Use when inserting user input into regex safely |


    Text Search and Extraction with Regular Expressions

    In the upcoming tasks, you will have the opportunity to use Python's re module to search for pieces of text that match the what you are looking for. This will require writing regular expressions with core regular expression syntax such as symbols, anchors, and quantifiers.

    Tip: In Python, regex patterns are usually written as raw strings by prefixing them with r, like r"\d+", so that backslashes are treated correctly.


  3. Challenge

    Text Manipulation

    Match Objects

    A match object is a special object returned by some re module methods. It provides useful information about the match, such as the matched text, its position in the original string, and any captured groups.

    Match Object Methods & Properties Quick Reference

    | Property / Method | Description | |---------------------|-----------------------------------------------------------------------------| | .group() | Returns the entire match (or a specific group if passed an index) | | .groups() | Returns a tuple of all captured groups (excluding named groups) | | .groupdict() | Returns a dictionary of all named capturing groups | | .start() | Returns the start index of the match | | .end() | Returns the end index (1 past the last character) of the match | | .span() | Returns a tuple (start, end) representing the range of the match | | .pos | The starting position of the search within the string | | .endpos | The ending position (limit) of the search | | .re | The regular expression object used for the match | | .string | The original string passed to re.search() or similar | | .lastgroup | The name of the last matched capturing group | | .lastindex | The index of the last matched capturing group (by number) |

    Text Manipulation with Regular Expressions

    In the upcoming tasks, you will use Python’s re module to manipulate text by substituting specific patterns with new text and splitting text based on defined patterns. You may also work with match objects to extract additional information about matches.

    Regular Expression Syntax Table Quick Reference
    | Symbol             | Category              | Meaning / Description                                                           |
    

    |--------------------|-----------------------|----------------------------------------------------------------------------------| | . | Wildcard | Matches any single character except newline | | \d | Character class | Matches any digit (same as [0-9]) | | \D | Character class | Matches any non-digit | | \w | Character class | Matches any word character (letters, digits, underscore) | | \W | Character class | Matches any non-word character | | \s | Character class | Matches any whitespace (space, tab, newline) | | \S | Character class | Matches any non-whitespace character | | [...] | Character set | Matches any one character inside brackets | | [^...] | Negated set | Matches any character not inside brackets | | | | Alternation | OR operator; matches either the left or right pattern | | () | Grouping | Groups expressions, enables capturing or combining parts | | (?:...) | Non-capturing group | Groups pattern but doesn't capture it | | (?P<name>...) | Named group | Captures a group with a name | | \b | Anchor | Word boundary (between word and non-word character) | | \B | Anchor | Not a word boundary | | ^ | Anchor | Matches the start of the string (or line with multiline flag) | | $ | Anchor | Matches the end of the string (or line with multiline flag) | | * | Quantifier | Matches 0 or more repetitions | | + | Quantifier | Matches 1 or more repetitions | | ? | Quantifier | Matches 0 or 1 (makes preceding token optional) | | {n} | Quantifier | Matches exactly n repetitions | | {n,} | Quantifier | Matches n or more repetitions | | {n,m} | Quantifier | Matches between n and m repetitions | | ? after quant. | Lazy modifier | Makes quantifier non-greedy (match as little as possible) | | (?=...) | Lookahead (positive) | Match if followed by pattern (doesn't include it in result) | | (?!...) | Lookahead (negative) | Match if not followed by pattern | | (?<=...) | Lookbehind (positive) | Match if preceded by pattern | | (?<!...) | Lookbehind (negative) | Match if not preceded by pattern | | \\ | Escape | Escapes a special character (e.g., \. matches a literal dot) |

    <details><summary>`re` Module Methods Quick Reference</summary>
    
    | Method         | Purpose                                   | Parameters                                | Returns                 | Notes                                              |
    

    |----------------|-------------------------------------------|-------------------------------------------|--------------------------|----------------------------------------------------| | match() | Match pattern at the start of string | pattern, string, flags=0 | Match object or None | Good for "does this string start with..." | | search() | Search anywhere in string | pattern, string, flags=0 | Match object or None | Finds first occurrence | | fullmatch() | Match the entire string | pattern, string, flags=0 | Match object or None | Use for strict validation | | findall() | Find all non-overlapping matches | pattern, string, flags=0 | List of strings or tuples| Use finditer() for match objects instead | | finditer() | Iterate over all matches as objects | pattern, string, flags=0 | Iterator of Match objects| Useful for position info, grouping, etc. | | sub() | Replace pattern with replacement string | pattern, repl, string, count=0 | New string | Use for find-and-replace | | subn() | Like sub(), but also returns count | pattern, repl, string, count=0 | Tuple: (string, count) | Great for auditing replacements | | split() | Split string by pattern | pattern, string, maxsplit=0 | List of strings | Smarter than str.split() | | compile() | Compile pattern for reuse | pattern, flags=0 | Compiled pattern object | Improves performance with repeated use | | escape() | Escape special regex chars in input | string | Escaped string | Use when inserting user input into regex safely |

  4. Challenge

    Real World Examples

    Real World Examples

    In this step, you will apply what you've learned about core regex syntax, Python’s re module, and match objects to solve real-world problems using regular expressions.

    Real-world regex skills are powerful for validating, extracting, and cleaning data across countless applications!

    Regular Expression Syntax Table Quick Reference
    | Symbol             | Category              | Meaning / Description                                                           |
    

    |--------------------|-----------------------|----------------------------------------------------------------------------------| | . | Wildcard | Matches any single character except newline | | \d | Character class | Matches any digit (same as [0-9]) | | \D | Character class | Matches any non-digit | | \w | Character class | Matches any word character (letters, digits, underscore) | | \W | Character class | Matches any non-word character | | \s | Character class | Matches any whitespace (space, tab, newline) | | \S | Character class | Matches any non-whitespace character | | [...] | Character set | Matches any one character inside brackets | | [^...] | Negated set | Matches any character not inside brackets | | | | Alternation | OR operator; matches either the left or right pattern | | () | Grouping | Groups expressions, enables capturing or combining parts | | (?:...) | Non-capturing group | Groups pattern but doesn't capture it | | (?P<name>...) | Named group | Captures a group with a name | | \b | Anchor | Word boundary (between word and non-word character) | | \B | Anchor | Not a word boundary | | ^ | Anchor | Matches the start of the string (or line with multiline flag) | | $ | Anchor | Matches the end of the string (or line with multiline flag) | | * | Quantifier | Matches 0 or more repetitions | | + | Quantifier | Matches 1 or more repetitions | | ? | Quantifier | Matches 0 or 1 (makes preceding token optional) | | {n} | Quantifier | Matches exactly n repetitions | | {n,} | Quantifier | Matches n or more repetitions | | {n,m} | Quantifier | Matches between n and m repetitions | | ? after quant. | Lazy modifier | Makes quantifier non-greedy (match as little as possible) | | (?=...) | Lookahead (positive) | Match if followed by pattern (doesn't include it in result) | | (?!...) | Lookahead (negative) | Match if not followed by pattern | | (?<=...) | Lookbehind (positive) | Match if preceded by pattern | | (?<!...) | Lookbehind (negative) | Match if not preceded by pattern | | \\ | Escape | Escapes a special character (e.g., \. matches a literal dot) |

    <details><summary>`re` Module Methods Quick Reference</summary>
    
    | Method         | Purpose                                   | Parameters                                | Returns                 | Notes                                              |
    

    |----------------|-------------------------------------------|-------------------------------------------|--------------------------|----------------------------------------------------| | match() | Match pattern at the start of string | pattern, string, flags=0 | Match object or None | Good for "does this string start with..." | | search() | Search anywhere in string | pattern, string, flags=0 | Match object or None | Finds first occurrence | | fullmatch() | Match the entire string | pattern, string, flags=0 | Match object or None | Use for strict validation | | findall() | Find all non-overlapping matches | pattern, string, flags=0 | List of strings or tuples| Use finditer() for match objects instead | | finditer() | Iterate over all matches as objects | pattern, string, flags=0 | Iterator of Match objects| Useful for position info, grouping, etc. | | sub() | Replace pattern with replacement string | pattern, repl, string, count=0 | New string | Use for find-and-replace | | subn() | Like sub(), but also returns count | pattern, repl, string, count=0 | Tuple: (string, count) | Great for auditing replacements | | split() | Split string by pattern | pattern, string, maxsplit=0 | List of strings | Smarter than str.split() | | compile() | Compile pattern for reuse | pattern, flags=0 | Compiled pattern object | Improves performance with repeated use | | escape() | Escape special regex chars in input | string | Escaped string | Use when inserting user input into regex safely |

    Match Object Methods & Properties Quick Reference

    | Property / Method | Description | |---------------------|-----------------------------------------------------------------------------| | .group() | Returns the entire match (or a specific group if passed an index) | | .groups() | Returns a tuple of all captured groups (excluding named groups) | | .groupdict() | Returns a dictionary of all named capturing groups | | .start() | Returns the start index of the match | | .end() | Returns the end index (1 past the last character) of the match | | .span() | Returns a tuple (start, end) representing the range of the match | | .pos | The starting position of the search within the string | | .endpos | The ending position (limit) of the search | | .re | The regular expression object used for the match | | .string | The original string passed to re.search() or similar | | .lastgroup | The name of the last matched capturing group | | .lastindex | The index of the last matched capturing group (by number) |

Jaecee is an associate author at Pluralsight helping to develop Hands-On content. Jaecee's background in Software Development and Data Management and Analysis. Jaecee holds a graduate degree from the University of Utah in Computer Science. She works on new content here at Pluralsight and is constantly learning.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.