Master the Linux ‘Three Musketeers’: grep, sed, and awk Explained
This guide introduces the Linux “three musketeers” – grep, sed, and awk – covering regular expression fundamentals, command syntax, options, and practical examples, enabling readers to efficiently search, edit, and process text files while mastering essential shell scripting techniques.
Linux Three Musketeers Overview
The “Linux three musketeers” refer to grep, sed, and awk. Mastering these tools can greatly improve operational efficiency. They rely on regular expressions, with Linux supporting both basic and extended regex. After mastering regex, the usage of the three tools is explained.
1. Regular Expressions
Regular expressions (REGEXP) are pattern templates used to match specific text. Proficiency in regex is a prerequisite for using the Linux three musketeers.
Metacharacters
.: matches any single character []: matches any single character within the specified set [^]: matches any single character not in the specified set
Character Classes
[[:digit:]]: matches a single digit [[:lower:]]: matches a single lowercase letter [[:upper:]]: matches a single uppercase letter [[:punct:]]: matches a single punctuation character [[:space:]]: matches a single whitespace character [[:alpha:]]: matches a single alphabetic character [[:alnum:]]: matches a single alphanumeric character
Quantifiers (greedy mode)
*: matches the preceding element zero or more times ?: matches the preceding element zero or one time +: matches the preceding element one or more times .*: matches any length of any characters
Anchors
^: anchors the match to the start of a line $: anchors the match to the end of a line ^$: matches an empty line
Linux-specific escaping
Because the shell interprets some metacharacters, they must be escaped with a backslash, e.g., \?, \+, \{m,n\}, \{1,\}, \{0,3\}.
Note: At least zero occurrences must be explicitly written.
\< or \b: anchors the start of a word \> or \b: anchors the end of a word
Grouping and Backreferences
\(\): defines a group \1, \2, …: refer to the content captured by the corresponding group.
2. Extended Regular Expressions
Standard regex requires many escaped symbols, which is inconvenient. Extended regex reduces the need for escaping, especially useful in sed scripts.
Character Matching
.: matches a single character [abc]: matches any one of a, b, or c [^abc]: matches any character except a, b, or c
Quantifiers (no extra escaping needed)
*, ?, +, {m,n} work as described above.
Anchors
Use ^ and $ as before. For word boundaries, use escaped \< and \>.
Alternation
|: matches either the expression on its left or right.
Note: C|cat matches the whole word “C” or “cat”.
Using extended regex simplifies sed commands and improves readability.
3. grep Family
3.1 grep Commands
grep, egrep, and fgrep are three subcommands for different scenarios.
grep : uses standard regex.
egrep : equivalent to grep -E, uses extended regex.
fgrep : a simplified version that does not support regex but is faster and uses fewer resources.
3.2 Usage
Syntax grep [options] PATTERN [FILE...] Options -i: ignore case --color: highlight matches -v: show lines that do not match the pattern -o: show only the matching part -E: use extended regex (same as egrep)
PATTERN : can be a plain string or a regex (basic or extended). FILE : files to search.
4. sed Command
4.1 Overview
sed (Stream Editor) is a powerful line‑oriented editor.
4.2 Basic Syntax
sed [option] 'script' [input file]...1. Option Part
-n: suppress automatic printing of lines that do not match. -e: specify multiple scripts. -f: read script from a file. -r: enable extended regex. -i: edit files in place.
2. Script Part
Script consists of an address (range) and an operation (e.g., substitute, insert, delete).
a) Address – empty (whole file)
Applies to the entire file.
b) Address – single line
n: operate on line n. /pattern/: operate on lines matching pattern (basic regex unless -r is used).
c) Address – range
n,m: from line n to line m inclusive. n,+k: from line n plus the next k lines. n,/pattern/: from line n to the next line matching pattern. /pattern1/,/pattern2/: from first occurrence of pattern1 to first occurrence of pattern2.
d) Address – step
1~2: every odd line. 2~2: every even line.
e) Editing Operations
d: delete line. p: print pattern space. a: append text after the addressed line (use \n for multiple lines). i: insert text before the addressed line. c: replace addressed line with new text. w: write matched lines to a file. r: read a file and insert its contents after the addressed line. !: apply command to lines that do NOT match the address. s///: substitute; the delimiter after s can be any non‑alphanumeric character to avoid escaping.
Replacement flags: g for global replace, p to display replaced lines.
Example: echo "/var/log/messages" | sed 's@[^/]+$/\?@@' removes the filename, leaving the directory path.
4.3 Advanced Usage
1. Pattern Space and Hold Space
The pattern space holds the current line; the hold space is a temporary buffer.
2. Related Commands
h: copy pattern space to hold space. H: append pattern space to hold space. g: copy hold space to pattern space. G: append hold space to pattern space. x: exchange pattern and hold spaces. n: read next line into pattern space (overwrites). N: append next line to pattern space. d: delete pattern space. D: delete up to first newline in pattern space.
3. Examples
sed -n 'n;p' FILE: display even lines. sed '1!G;h;$!d' FILE: reverse file content. sed '$!d' FILE: print the last line. sed '\$!N;$!D' FILE: print the last two lines. sed '/^$/d;G' FILE: delete blank lines and add a blank line after each non‑blank line. sed 'G' FILE: add a blank line after every line. sed 'n;d' FILE: display odd lines.
5. awk Command
5.1 Overview
awk is a report generator named after its three authors. It processes input line by line, splits fields, and executes actions.
5.2 Basic Usage
1. Syntax
awk [option] 'PATTERN{ACTION}' FILEFields are accessed as $0, $1, $2, etc.
2. Common Options
-F: input field separator. -v: define a variable ( var=value).
3. Patterns
/pattern/: lines matching regex. ! /pattern/: lines not matching. NR>2: line number condition. BEGIN{...}: executed before processing. END{...}: executed after processing.
4. Built‑in Variables
FS : input field separator (default whitespace). OFS : output field separator. RS : input record separator (default newline). ORS : output record separator. NF : number of fields in the current record. NR : total number of records processed. FNR : record number in the current file. FILENAME : name of the current file. ARGC , ARGV : command‑line arguments.
5. Common Actions
print : output items separated by OFS.
printf : formatted output.
Control statements: if, while, for, break, continue.
Arrays for counting, e.g., ip[$1]++.
6. Examples
Print users whose shell is /bin/bash: awk -F: '$NF=="/bin/bash" {print $1, $NF}' /etc/passwd Count occurrences of the first column: awk '{ip[$1]++} END{for(i in ip) print i, ip[i]}' access.log Sum a numeric field for rows where the second column is between 30 and 90:
awk -F: '$2>=30 && $2<=90 {dic[$1]+=$3} END{for(i in dic) print i, dic[i]}' dataThese tools form the core of Linux text processing and are essential for efficient system administration and data manipulation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
