Master Regular Expressions in Shell: Practical sed & gawk Guide
This tutorial explains how to create and use regular expressions with the sed editor and gawk program in shell scripts, covering basic concepts, pattern types, special characters, quantifiers, grouping, and real‑world examples such as counting files, validating phone numbers, and verifying email addresses.
Regular Expressions
In shell scripting, sed and gawk rely on regular expressions (regex) to select or transform text streams.
What a regular expression is
A regex is a pattern template that Linux utilities match against input text. If the input satisfies the pattern the line is processed; otherwise it is discarded.
Regex engines
POSIX Basic Regular Expressions (BRE) – used by most utilities, including sed (often a subset for speed).
POSIX Extended Regular Expressions (ERE) – used by gawk and many modern tools.
BRE and ERE differ in which metacharacters are special and in the syntax for quantifiers.
Basic BRE patterns
Literal text matches itself. Example:
echo "This is a test" | sed -n '/test/p'
This is a testRegex matching is case‑sensitive; the pattern /this/p fails because the input contains a capital T.
echo "This is a test" | sed -n '/this/p' # no output
echo "This is a test" | sed -n '/This/p' # prints the lineSpecial characters and escaping
Metacharacters such as ., *, ?, +, ^, $, [], (), | have special meaning. To match them literally, prefix with a backslash ( \).
echo "The cost is $4.00" | sed -n '/\$/p'
The cost is $4.00Anchors
^anchors a pattern to the start of a line; $ anchors it to the end.
echo "Books are great" | sed -n '/^Books/p'
Books are greatCombining both anchors (e.g. /^book$/) matches a line that consists solely of the word book.
Dot character
.matches any single character except a newline.
echo "at ten oclock we" | sed -n '/.at/p'
at ten oclock weCharacter classes
Square brackets define a set of characters. Ranges can be expressed with a hyphen.
echo "cat" | sed -n '/[ch]at/p'
catCase‑insensitive matching can be expressed with a class that includes both cases:
echo "Yes" | sed -n '/[Yy]es/p'
YesNegated character classes
Placing ^ as the first character inside brackets negates the class.
echo "cat" | sed -n '/[^ch]at/p' # no match because the preceding character is c or hIntervals (brace quantifiers)
In ERE, {m} means exactly m repetitions; {m,n} means between m and n repetitions. gawk requires the --re-interval option to enable this syntax.
echo "bet" | gawk --re-interval '/be{1,2}t/{print}'
betAlternation (pipe)
The pipe | provides logical OR between alternatives.
echo "The cat is asleep" | gawk '/cat|dog/{print}'
The cat is asleepGrouping
Parentheses group sub‑patterns so that quantifiers apply to the whole group.
echo "Saturday" | gawk '/Sat(urday)?/{print}'
SaturdayPractical shell examples
Counting files in $PATH
Convert the colon‑separated $PATH to a space‑separated list, iterate over each directory, and count the entries.
#!/bin/bash
# Count files in each directory listed in $PATH
mypath=$(echo "$PATH" | sed 's/:/ /g')
for dir in $mypath; do
count=0
for file in "$dir"/*; do
[ -e "$file" ] && count=$((count + 1))
done
echo "$dir - $count"
doneValidating US phone numbers
The following ERE matches the four common US formats:
^\(?[2-9][0-9]{2}\)?( |-|\.)[0-9]{3}( |-|\.)[0-9]{4}$Script isphone prints only lines that contain a valid phone number:
#!/bin/bash
# Filter out invalid US phone numbers
gawk --re-interval '/^\(?[2-9][0-9]{2}\)?( |-|\.)[0-9]{3}( |-|\.)[0-9]{4}$/{print}'Validating email addresses
A regex that accepts usernames containing letters, digits, ._+- and hostnames containing letters, digits, ._-. The top‑level domain is limited to 2‑5 letters:
^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Script isemail prints only matching lines.
Key differences between sed and gawk
Both tools use different regex engines; gawk supports most ERE features (including {m,n} intervals) while sed is limited to BRE.
These examples demonstrate how regular expressions enable powerful text filtering, data validation, and automation in shell scripts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
