Master Shell Wildcards and POSIX Regex: A Practical Guide
This article explains the meaning of common special characters used as shell wildcards, character classes, and POSIX regular expressions, demonstrates locale effects on pattern matching, and compares BRE and ERE syntax with practical examples and exercises.
In programming you often encounter special characters such as * ? + [] {} ^ $ \ ( ) |; they represent concepts like wildcards, Basic Regular Expressions (BRE), Extended Regular Expressions (ERE) and PCRE.
Below are three questions to test your understanding of wildcards, their difference from regular expressions, and the two types of POSIX regex.
Introduction
Because the shell frequently works with file names, it provides special characters to quickly specify groups of files—these are called wildcards.
* matches any number of characters ? matches a single character
[characters] matches any one character from the set [!characters] matches any character not in the set [:class:] matches any character belonging to the specified POSIX character class
Examples of the first four patterns: * (all files), g* (files starting with g), b*.txt (txt files starting with b), Data??? (files starting with Data and total length 7), [abc] (files starting with a, b, or c), abc[0-9][0-9] (files starting with abc followed by two digits), [A-Z] (behaviour varies with system locale).
In a CentOS example the pattern [A-Z] did not produce the expected result because the locale influences character ordering. Early UNIX used 7‑bit ASCII (0‑127). Later extensions added 8‑bit characters (128‑255) to support non‑English languages. POSIX introduced the locale concept to adapt sorting rules, e.g., some locales order as aAbBcC…xXyYzZ, causing ls [A-Z]* to match all letters except ‘a’.
Character Classes
Because different locales behave differently, POSIX defines explicit character classes:
[:alnum:] matches any letter or digit [:alpha:] matches any letter [:digit:] matches any digit [:lower:] matches lowercase letters [:upper:] matches uppercase letters
Testing these classes with the following image shows their effect.
POSIX Regular Expressions
POSIX splits regular expressions into Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE). The main difference is the set of meta‑characters they support.
BRE supports ^$.[]*\ , while ERE adds (){}?+| .
Application Support
Programs that support BRE include sed , grep , etc.; programs that support ERE include egrep , grep -E , awk , and others.
Exercise
The following screenshots illustrate how grep handles basic regex and how egrep handles extended regex.
Understanding the distinction between literal braces { in BRE and escaped braces \{ in ERE is crucial; misuse can lead to unpredictable results, so always write regexes carefully and document them for future reference.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.