Master Regular Expressions: Key Metacharacters and Practical Shell Examples
This article explains the most common regular‑expression metacharacters, their meanings, and shows how to combine them in real‑world grep commands to match patterns such as lines starting with 30, containing 70 or 1800, and validating email addresses.
Regular‑Expression Metacharacters Overview
Regular expressions use metacharacters to describe sets of characters and positions. Below are the most frequently used ones (not an exhaustive list). . – any single character except line‑breaks. [ ] – matches any one character inside the brackets. [^xyz] – negated character class; matches any character not listed. [a-z] – character range; matches any character from a to z. The hyphen only defines a range when placed between two characters inside the class. [^a-z] – negated range; matches any character not in a‑z. [:space:] – any whitespace character. [:punct:] – any punctuation symbol. [:lower:] – a lowercase letter (a‑z). [:upper:] – an uppercase letter (A‑Z). [:digit:] – a digit (0‑9). [:alnum:] – any letter or digit. [:alpha:] – any alphabetic character (a‑z or A‑Z).
Position Anchors
^– start of the input string (or line when multiline). $ – end of the input string (or line). \b – word boundary. \B – non‑word boundary. \< and \> – start and end of a word.
Quantifiers
*– previous element 0 or more times. + – previous element 1 or more times. ? – previous element 0 or 1 time. {n} – exactly n repetitions. {n,} – at least n repetitions. {n,m} – between n and m repetitions (n ≤ m).
Grouping and Back‑references
( )– groups a sub‑pattern and captures it for later reference. \1 … \9 – refer to the text matched by the corresponding captured group. \n – back‑reference to the n‑th captured group. | – logical OR between alternatives. \ – escape character to treat a metacharacter as a literal.
Understanding these symbols is essentially learning a small, expressive language that computers can interpret.
Practical Shell Examples
Assume a file /tmp/ceshi containing the following numbers (each line is a space‑separated list):
130 120 200 450 12 24 70 140 8000 30
30 120 200 450 12 24 170 140 80
78 30 1800 200 450 12 24 170 40 80
30 1800 200 450 120 24 170 40 70 70 70
389 30 1800 200 450 120 24 1000 40 70
30 30 1800 200 450 120 24 1000 40 70
130120 200 450122470140800030
30120200450122417014080
7830180020045012241704080
3018002004501202417040707070
3893018002004501202410004070
303018002004501202410004070Example 1 – Lines that start with 30 and contain 70 somewhere later: ^30.*70 Command: # cat ceshi | grep -E "^30.*70" Result:
30 120 200 450 12 24 170 140 80
30 1800 200 450 120 24 170 40 70 70 70
30 30 1800 200 450 120 24 1000 40 70
30120200450122417014080
3018002004501202417040707070
303018002004501202410004070Example 2 – Lines that start with 30 , contain 1800 somewhere, and end with 70 : ^30.*1800.*70$ Command: # cat ceshi | grep -E "^30.*1800.*70$" Result:
30 1800 200 450 120 24 170 40 70 70 70
30 30 1800 200 450 120 24 1000 40 70
3018002004501202417040707070
303018002004501202410004070Variations such as requiring a space after 30 ( ^30[ ]) or a specific digit ( ^30[1|3]) can be expressed similarly.
Quantifier Edge Cases
To match the string 70 appearing 2 to 3 times consecutively, the correct pattern is (70){2,3}. Using 70{2,3} would match 700 or 7000 because the quantifier applies only to the preceding character.
Command example: # cat ceshi | grep -E "(70){2,3}" Result:
3018002004501202417040707070Word‑Boundary Demonstrations
Comparing \B200\B, \b200\b, and \<200\> shows how the same number can be matched as part of a larger token, as a whole word, or as a word surrounded by angle brackets.
# cat ceshi | grep -E "\B200\B"
# cat ceshi | grep -E "\b200\b"
# cat ceshi | grep -E "\<200\>"Each command yields different subsets of the data.
Email Validation Example
To validate an email address such as [email protected] (6‑18 characters, starts with a letter, may contain letters, digits, underscores), the following pattern can be used:
^[[:alpha:]]([a-zA-Z0-9_]){5,17}@([[:alnum:]]+\.)+[[:alnum:]]+$This ensures the local part starts with a letter, is followed by 5‑17 allowed characters, and the domain consists of one or more dot‑separated alphanumeric labels.
In summary, mastering these metacharacters and practicing with real data enables you to craft precise regular‑expression queries for a wide range of text‑processing tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
