Everyday Regular Expressions
Jun 08, 2023 • 0 Minute Read
Nearly every programmer or systems administrator will use regular expressions at some point in their career; whether it's as simple as changing a select bit of text in vim, or as advanced as validating detailed bits of information. While it's not uncommon to think of confusing hieroglyphics of characters when considering regular expressions, if you use Linux or do any programming, then you've probably already used them in many of your day-to-day tasks.For those looking for a more detailed look at regular expressions, we've recently released Mastering Regular Expressions, and to give you a taste of the course without giving you anything away, we're going to learn some basic regex and some basic Perl by building a simple CLI-based regex validator.
The Goal
We want to create a command-line program that will ask us for two things: A string and a regular expression. This program then compares them to see if they match.But What Are Regular Expressions?
Regular expressions allow us to match patterns and then use those matches for a number of different tasks depending on the tool. Regular expressions can match with literal characters (i.e., ana
matches an a
); metacharacters like w
, which matches any alphanumeric character; or escaped characters like .
, which works as a literal period since the period itself is a metacharacter. We can use additional features like classes, groups, lookarounds, and conditionals to refine our pattern further.
Making the Validator
Perl and regular expressions play very well together; in fact, PCRE, or Perl-Compatible Regular Expressions, is one of the most common regex standards to come across (even if Perl doesn't implement it 100% the same). As such, learning regex with Perl is a great idea, since there's not much that Perl can't do with regular expressions. While programs likesed
or grep
might limit your ability to use features like conditionals, Perl supports almost all regex features.So let's get started crafting this validator! Open whichever text editor you prefer to a blank document; I used vim and named my file regex-val.pl
.
- Add the hashbang:
If you aren't sure what you should put here, run#! /usr/local/bin/perl
which perl
on the command line. - We now want to set some variables that should be fed in via STDIN when the script is run. I'm going to call these
text
for the text we want to validate, andregex
for the expression we're validating against:$text = ; $regex = ;
- We also want to add some prompts so that when we run the script, we know what we're inputting:
print "Enter a string: "; $text = ; print "Enter a regular expression: "; $regex = ;
- Next, we want to check if our
text
matches ourregex
, so we need to craft anif
statement. To denote something in Perl is a regular expression, we need to encase it in forward slashes:
This expression is simple enough: We're saying thatif ( $text =~ /$regex/ ) { }
if
ourtext
equals (=~
) ourregex
, we should run the function in the curly brackets. We haven't written that yet, but since all we want to do it validate that two things match, we can just add a simpleprint
command:if ( $text =~ /$regex/ ) { print "Match!n"; }
The
=~
operator is specifically used for checking a scalar like ourtext
against a pattern match. - Of course, we also want to output a respond if something isn't a match, so we can just use an
else
statement:
This leaves us with the following as the entire script:if ( $text =~ /$regex/ ) { print "Match!n"; } else { print "Not a match!n"; }
#! /usr/local/bin/perl print "Enter a string: "; $text = ; print "Enter a regular expression: "; $regex = ; if ( $text =~ / $regex/ ) { print "Match!n"; } else { print "Not a match!n"; }
- Save the file and make it executable:
$ chmod +x regex-val.pl
- Test it out by checking an IP address against the regular expression for an IP address:
$ ./vali.pl Enter a string: 192.54.13.122 Enter a regular expression: d{1,3}.d{1,3}.d{1,3}.d{1,3} Match!
Basic Regular Expressions Cheat Sheet
Want to test some regular expressions without taking a whole course? Here's a table of metacharacters and features. Try some out!Expression | Meaning |
---|---|
w | Match any word-based characters, A-Z, a-z, 0-9 |
W | Match any non-word character |
d | Match any digits, 0-9 |
D | Match any non-digit |
s | Match any whitespace |
s | Match any non-whitespace |
t | Match any tabs |
n | Match any newlines |
^ | Match the start of a line |
$ | Match the end of a line |
b | Mark a boundary |
[ ... ] | Set a character class; example: [abC] can match either a , b , or C |
[^ ... ] | A negated character class; match anything except the letters in the class |
( ... ) | Group characters together |
| | When used in a group, it acts as an or |
? | Mark character optional |
+ | Repeat the previous character one or more times |
* | Repeat the previous character zero or more times |
. | Wildcard |