Fundamentals 36 min read

In‑Depth Guide to Linux’s Text‑Processing Trio: grep, sed, and awk

This comprehensive tutorial explains the three essential Linux text‑processing tools—grep, sed, and awk—covering their core concepts, command‑line options, regular‑expression basics, practical examples, advanced features such as control structures and functions, and a side‑by‑side comparison of their capabilities.

Linux Tech Enthusiast
Linux Tech Enthusiast
Linux Tech Enthusiast
In‑Depth Guide to Linux’s Text‑Processing Trio: grep, sed, and awk

1. grep

grep (Global Regular Expression Print) is a powerful text‑search utility that uses regular expressions to find matching lines in one or more files and prints them. It returns exit status 0 for a successful match, 1 for no match, and 2 if a file cannot be read, enabling its use in shell scripts.

1.1 Options

-A<n>

: Show n lines after each matching line. -B<n>: Show n lines before each matching line. -C<n>: Show n lines of context before and after. -c: Print only the count of matching lines. -e "pattern": Specify multiple patterns (logical OR). -E: Enable extended regular expressions. -f FILE: Read patterns from FILE. -F: Treat the pattern as a fixed string (like fgrep). -i: Ignore case. -n: Prefix each matching line with its line number. -o: Print only the matching part of the line. -q: Quiet mode; no output, only exit status. -s: Suppress error messages. -v: Invert match (show non‑matching lines). -w: Match whole words only.

1.2 Practical Demonstration

# Find lines containing "error" in log.txt
grep "error" log.txt

# Show 2 lines of context after each match
grep -A2 "error" log.txt

# Count matches of pattern "^WARN"
grep -c "^WARN" log.txt

2. Regular Expressions

Regular expressions (regex) are widely used in Linux tools for pattern matching. They can be basic (BRE) or extended (ERE). Common constructs include: .: Any single character except newline. [abc]: Any one of the listed characters. [^abc]: Any character except those listed. [:alnum:] or [0-9a-zA-Z]: Alphanumeric characters. [:space:]: Any whitespace character. [:digit:] or [0-9]: Digits. ^ and $: Anchor to start or end of a line. \b and \B: Word boundary and non‑boundary.

2.1 Quantifiers

*

: Zero or more repetitions. +: One or more repetitions. ?: Zero or one repetition. {n}: Exactly n repetitions. {n,}: At least n repetitions. {n,m}: Between n and m repetitions.

3. sed

sed (stream editor) processes input line by line, storing the current line in a temporary buffer called the pattern space. Commands operate on this buffer, and the result is sent to standard output unless suppressed. The original file is unchanged unless redirection or the -i option is used.

3.1 Common Options

-n

: Suppress automatic printing; only explicit p commands output. -e "script": Provide multiple editing commands. -f scriptfile: Read editing commands from a file. -r: Enable extended regular expressions. -i[.bak]: Edit files in place, optionally creating a backup.

3.2 Addressing

No address: Apply command to every line.

Single address ( # or /pattern/): Apply to the specified line or matching lines.

Range ( #,#, #,+#, /pat1/,/pat2/): Apply to a block of lines. ~ step: e.g., sed -n '1~2p' prints every odd line.

3.3 Editing Commands

d

: Delete the pattern space. p: Print the pattern space. a TEXT: Append TEXT after the current line. i TEXT: Insert TEXT before the current line. c TEXT: Replace the current line with TEXT. w FILE: Write matching lines to FILE. r FILE: Read FILE and insert its contents. =: Print the current line number. !: Negate the address (process non‑matching lines). s/RE/REPL/FLAGS: Substitute; flags may include g (global), i (case‑insensitive), or numeric case conversion flags ( etc.).

3.4 Example: Reverse File Order

# Reverse the lines of num.txt
sed '1!G;h;$!d' num.txt

4. awk

awk is a full‑featured programming language for text and data processing on Unix/Linux. It reads input line by line, splits each line into fields based on a delimiter, and allows user‑defined functions, dynamic regular expressions, and complex processing.

4.1 Basic Syntax

awk [options] 'program' var=value file…
awk [options] -f programfile var=value file…
awk [options] 'BEGIN{…} pattern{…} END{…}' file…

4.2 Common Options

-F fs

: Set input field separator ( fs can be a string or regex). -v var=value: Define a variable before execution. -f scriptfile: Read the awk program from a file.

4.3 Built‑in Variables

FS

: Input field separator (default whitespace). OFS: Output field separator. RS: Input record separator (default newline). ORS: Output record separator. NF: Number of fields in the current record. NR: Total record number across all input files. FNR: Record number within the current file. FILENAME: Name of the current file. ARGC and ARGV: Argument count and array.

4.4 printf

Unlike print, printf requires an explicit format string and does not add a newline automatically.

# Print a formatted table from /etc/passwd
awk -F: 'BEGIN{printf "username      uid
-------------------
"}
{printf "%-15s %5d
", $1, $3}' /etc/passwd

4.5 Operators

Arithmetic: +, -, *, /, ^, % and unary -x, +x.

String concatenation: simply place strings/variables side by side.

Assignment: =, +=, -=, *=, /=, %=, ^=, ++, --.

Comparison: ==, !=, >, >=, <, <=.

Pattern matching: ~ /regex/ (match) and !~ /regex/ (no match).

Logical: &&, ||, !.

Conditional (ternary): cond ? expr1 : expr2.

4.6 Control Statements

if (cond) { … } else { … }
while (cond) { … }
do { … } while (cond)
for (init; cond; incr) { … }
for (var in array) { … }
break

and continue to control loops. next: Skip remaining actions for the current record and read the next line.

4.7 Arrays

Awk supports associative arrays. Elements are created on first use and default to the empty string. Use for (i in arr) to iterate.

# Count occurrences of each line in a file
awk '{count[$0]++} END {for (line in count) print line, count[line]}' file.txt

4.8 User‑Defined Functions

function max(a,b) {
    return (a > b) ? a : b
}
BEGIN { print max(3,5) }

4.9 Calling Shell Commands

Use system("command") to execute external commands. Variables can be concatenated with strings inside the call.

# Print the hostname from within awk
awk 'BEGIN{ system("hostname") }'

5. grep vs. awk vs. sed Comparison

grep is primarily a line‑oriented search tool that matches patterns using regular expressions.

awk builds on grep’s pattern matching but adds field‑level processing, programming constructs, and full‑featured language capabilities, making it suitable for complex data extraction and transformation.

sed is a non‑interactive stream editor that applies editing commands to each line, useful for in‑place modifications and simple transformations.

In practice, start with grep for quick searches, move to awk when you need column‑wise processing or calculations, and use sed for line‑by‑line edits or batch substitutions.

Linuxregular expressionstext processingshell scriptinggrepawksed
Linux Tech Enthusiast
Written by

Linux Tech Enthusiast

Focused on sharing practical Linux technology content, covering Linux fundamentals, applications, tools, as well as databases, operating systems, network security, and other technical knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.