Mastering Linux Text Processing: A Deep Dive into awk, grep, and sed
This comprehensive guide explains the core Linux text‑processing utilities—awk, grep, and sed—covering their purposes, syntax, common options, powerful regular‑expression features, practical examples, control structures, arrays, functions, and how they compare to each other for efficient command‑line data manipulation.
1. grep
grep is a powerful text‑search tool that uses regular expressions to find and print matching lines. It returns status codes (0 for success, 1 for no match, 2 for error) which can be used in scripts. egrep is grep with the -E option for extended regular expressions.
1.1 What is grep and egrep?
grep searches files for patterns and prints matching lines. egrep enables extended regex syntax.
1.2 Using grep
1.2.1 Command format
grep [option] pattern file1.2.2 Common options
-A<n>: print n lines after a match
-B<n>: print n lines before a match
-C<n>: print n lines around a match
-c: count matching lines
-e: specify multiple patterns (OR logic)
-E: use extended regular expressions
-f FILE: read patterns from FILE
-F: fixed‑string search (same as fgrep)
-i: ignore case
-n: show line numbers
-o: show only the matching part
-q: quiet mode (no output)
-s: suppress error messages
-v: invert match
-w: match whole words
1.3 grep practical demo
Examples showing how to filter /dev entries, display usage percentages, and combine with awk for numeric comparisons.
2. Regular Expressions
Regular expressions are widely used in Linux and many programming languages for pattern matching.
2.1 Basic regex
. matches any single character
[] character class
[^] negated class
[:alnum:] or [0-9a-zA-Z] for alphanumerics
[:alpha:] or [a-zA-Z] for letters
[:upper:] or [A-Z] for uppercase
[:lower:] or [a-z] for lowercase
[:blank:] for space and tab
[:space:] for all whitespace
[:cntrl:] for control characters
[:digit:] or [0-9] for digits
[:xdigit:] for hexadecimal digits
[:graph:] for printable non‑space characters
[:print:] for printable characters
[:punct:] for punctuation
2.2 Quantifiers
* zero or more
+ one or more
? zero or one
{n} exactly n
{m,n} between m and n
{n,} at least n
{,n} at most n
2.3 Anchors
^ start of line
$ end of line
\b word boundary
\< and \> for start/end of word (POSIX)
3. sed
sed is a stream editor that processes input line by line, applying editing commands without modifying the original file unless redirected.
3.1 Using sed
3.1.1 Command format
sed [options] 'address command' file3.1.2 Common options
-n: suppress automatic printing
-e: multiple editing commands
-f FILE: read commands from FILE
-r: enable extended regex
-i: edit file in place
-i.bak: edit in place with backup
3.1.3 Addressing
No address: apply to all lines
Single line number or pattern
Range: start,end (line numbers or patterns)
~ step syntax (e.g., 1~2p prints odd lines)
3.1.4 Editing commands
d: delete line
p: print line
a: append after line
i: insert before line
c: replace line
w FILE: write line to FILE
r FILE: read FILE after line
=: print line number
!: negate address match
s/regex/repl/: substitute (supports flags g, number, etc.)
3.2 sed practical demos
Examples include printing matching lines, deleting specific lines, inserting text, reversing file order, showing odd/even lines, and adding blank lines between lines.
4. awk
awk is a programming language for processing text and data streams. It works with stdin, files, or command output, supports user‑defined functions, arrays, and complex pattern matching.
4.1 Basic usage
4.1.1 Syntax
awk [options] 'program' var=value file…4.1.2 Common options
-F fs: set input field separator
-v var=value: assign variable before execution
-f scriptfile: read program from file
4.2 Built‑in variables
FS: input field separator (default whitespace)
OFS: output field separator
RS: input record separator
ORS: output record separator
NF: number of fields in current record
NR: total record number across files
FNR: record number within current file
FILENAME: current file name
ARGC, ARGV: command‑line arguments
4.3 Control statements
if‑else
if (condition) { ... } else { ... }while loop
while (condition) { ... }do‑while loop
do { ... } while (condition)for loop
for (i=1; i<=NF; i++) { ... }for (var in array)
for (key in arr) { print key, arr[key] }4.4 Arrays
Associative arrays can use arbitrary string keys. Uninitialized elements default to the empty string. Use for (k in a) to iterate.
4.5 Functions
function max(a,b) { return (a>b ? a : b) }4.6 Interaction with the shell
Use system("command") to run external commands. Pass variables with -v or as arguments after the script.
5. Comparison of grep, awk, and sed
grep is primarily a line‑oriented search tool using regular expressions. sed is a non‑interactive stream editor for applying scripted edits. awk is a full‑featured pattern‑scanning and processing language that can perform the tasks of both grep and sed and adds programming constructs such as loops, conditionals, and functions.
For simple pattern matching, grep is sufficient. For line‑by‑line editing, sed is appropriate. When complex field‑based processing, calculations, or data transformation are needed, awk is the most powerful and concise choice.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
