Fundamentals 33 min read

Mastering Linux Text Processing: A Deep Dive into awk, grep, and sed

This comprehensive guide explains the core Linux text‑processing utilities—awk, grep, and sed—covering their purposes, syntax, common options, powerful regular‑expression features, practical examples, control structures, arrays, functions, and how they compare to each other for efficient command‑line data manipulation.

Open Source Linux

Dec 4, 2023

Mastering Linux Text Processing: A Deep Dive into awk, grep, and sed

1. grep

grep is a powerful text‑search tool that uses regular expressions to find and print matching lines. It returns status codes (0 for success, 1 for no match, 2 for error) which can be used in scripts. egrep is grep with the -E option for extended regular expressions.

1.1 What is grep and egrep?

grep searches files for patterns and prints matching lines. egrep enables extended regex syntax.

1.2 Using grep

1.2.1 Command format

grep [option] pattern file

1.2.2 Common options

-A<n>: print n lines after a match

-B<n>: print n lines before a match

-C<n>: print n lines around a match

-c: count matching lines

-e: specify multiple patterns (OR logic)

-E: use extended regular expressions

-f FILE: read patterns from FILE

-F: fixed‑string search (same as fgrep)

-i: ignore case

-n: show line numbers

-o: show only the matching part

-q: quiet mode (no output)

-s: suppress error messages

-v: invert match

-w: match whole words

1.3 grep practical demo

Examples showing how to filter /dev entries, display usage percentages, and combine with awk for numeric comparisons.

2. Regular Expressions

Regular expressions are widely used in Linux and many programming languages for pattern matching.

2.1 Basic regex

. matches any single character

[] character class

[^] negated class

[:alnum:] or [0-9a-zA-Z] for alphanumerics

[:alpha:] or [a-zA-Z] for letters

[:upper:] or [A-Z] for uppercase

[:lower:] or [a-z] for lowercase

[:blank:] for space and tab

[:space:] for all whitespace

[:cntrl:] for control characters

[:digit:] or [0-9] for digits

[:xdigit:] for hexadecimal digits

[:graph:] for printable non‑space characters

[:print:] for printable characters

[:punct:] for punctuation

2.2 Quantifiers

* zero or more

+ one or more

? zero or one

{n} exactly n

{m,n} between m and n

{n,} at least n

{,n} at most n

2.3 Anchors

^ start of line

$ end of line

\b word boundary

\< and \> for start/end of word (POSIX)

3. sed

sed is a stream editor that processes input line by line, applying editing commands without modifying the original file unless redirected.

3.1 Using sed

3.1.1 Command format

sed [options] 'address command' file

3.1.2 Common options

-n: suppress automatic printing

-e: multiple editing commands

-f FILE: read commands from FILE

-r: enable extended regex

-i: edit file in place

-i.bak: edit in place with backup

3.1.3 Addressing

No address: apply to all lines

Single line number or pattern

Range: start,end (line numbers or patterns)

~ step syntax (e.g., 1~2p prints odd lines)

3.1.4 Editing commands

d: delete line

p: print line

a: append after line

i: insert before line

c: replace line

w FILE: write line to FILE

r FILE: read FILE after line

=: print line number

!: negate address match

s/regex/repl/: substitute (supports flags g, number, etc.)

3.2 sed practical demos

Examples include printing matching lines, deleting specific lines, inserting text, reversing file order, showing odd/even lines, and adding blank lines between lines.

4. awk

awk is a programming language for processing text and data streams. It works with stdin, files, or command output, supports user‑defined functions, arrays, and complex pattern matching.

4.1 Basic usage

4.1.1 Syntax

awk [options] 'program' var=value file…

4.1.2 Common options

-F fs: set input field separator

-v var=value: assign variable before execution

-f scriptfile: read program from file

4.2 Built‑in variables

FS: input field separator (default whitespace)

OFS: output field separator

RS: input record separator

ORS: output record separator

NF: number of fields in current record

NR: total record number across files

FNR: record number within current file

FILENAME: current file name

ARGC, ARGV: command‑line arguments

4.3 Control statements

if‑else

if (condition) { ... } else { ... }

while loop

while (condition) { ... }

do‑while loop

do { ... } while (condition)

for loop

for (i=1; i<=NF; i++) { ... }

for (var in array)

for (key in arr) { print key, arr[key] }

4.4 Arrays

Associative arrays can use arbitrary string keys. Uninitialized elements default to the empty string. Use for (k in a) to iterate.

4.5 Functions

function max(a,b) { return (a>b ? a : b) }

4.6 Interaction with the shell

Use system("command") to run external commands. Pass variables with -v or as arguments after the script.

5. Comparison of grep, awk, and sed

grep is primarily a line‑oriented search tool using regular expressions. sed is a non‑interactive stream editor for applying scripted edits. awk is a full‑featured pattern‑scanning and processing language that can perform the tasks of both grep and sed and adds programming constructs such as loops, conditionals, and functions.

For simple pattern matching, grep is sufficient. For line‑by‑line editing, sed is appropriate. When complex field‑based processing, calculations, or data transformation are needed, awk is the most powerful and concise choice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux text processing grep awk sed

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.