Mastering sed and awk: Powerful Text Processing Techniques for the Command Line
This comprehensive guide explores the strengths of sed and awk, compares their core features, provides practical basic and advanced examples, demonstrates how to combine them for complex pipelines, offers performance tips, and includes real‑world log‑analysis use cases for efficient text manipulation in shell scripts.
Introduction
In the world of shell scripting, sed and awk act like twin Swiss‑army knives: sed excels at stream editing while awk shines at field‑oriented processing. Together they can handle the majority of text‑processing tasks.
Basic Usage
Simple one‑liners illustrate their core capabilities:
# Replace "World" with "Linux" using sed
echo "Hello World" | sed 's/World/Linux/'
# Print the first and third fields using awk
echo "Alice 25 F" | awk '{print $1 " is " $3}'Feature Comparison
sed : pattern matching and replacement, fast line‑by‑line processing, lightweight.
awk : field analysis, statistical aggregation, complex transformations.
Advanced sed Techniques
Multiple replacements with -e: sed -e 's/foo/bar/' -e 's/hello/hi/' input.txt Address range editing: sed '3,5s/old/new/' file.txt Pattern‑specific replacement: sed '/pattern/s/old/new/' file.txt Hold space tricks for swapping lines or removing duplicates.
# Swap adjacent lines
sed -n '1{h;n};G;s/
/ /;p' text.txt
# Remove duplicate lines (like uniq)
sed '$!N; /^(.*)
\1$/!P; D' duplicates.txtAdvanced awk Techniques
Conditional counting:
awk -v threshold=80 '$3 > threshold {count++} END {print count}' data.txtField reordering: awk '{print $3, $1, $2}' names.txt Custom field separators: awk -F'[ :]' '{print $2, $4}' log.txt Word‑frequency counting:
awk '{for(i=1;i<=NF;i++) count[$i]++} END {for(w in count) print w, count[w]}' text.txtGroup‑by aggregation:
awk '{sum[$1]+=$2} END {for(k in sum) print k, sum[k]}' sales.datDefining functions:
function to_upper(str) { return toupper(str) }
{ print to_upper($1) }' names.txtCombining sed and awk
Typical pipelines first filter with sed, then extract or aggregate with awk:
# Extract IPs from nginx logs for a specific hour and count occurrences
cat access.log | sed -n '/15\/Aug\/2023:14:/p' |
awk '{print $1}' | sort | uniq -c | sort -nr
# Convert JSON to CSV (simplified)
sed 's/{//;s/}//;s/"//g' data.json |
awk -F': ' '{gsub(/,/,"",$2); print $1","$2}'Performance Tips
Stream processing to avoid loading whole files into memory.
Use GNU parallel for concurrent processing of large files.
Prefer built‑in regex extensions ( -E or -r) over complex patterns.
Reduce pipeline length by combining operations in a single awk script.
Real‑World Case: Apache Log Analysis
Top‑10 IP addresses:
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10Enhanced version with time filter:
sed -n '/15\/Aug\/2023:1[4-5]/p' access.log |
awk '{ip[$1]++} END {for(i in ip) print ip[i], i}' | sort -nrRequest‑type statistics:
awk '{gsub(/"/,"",$6); type[$6]++; size[$6]+=$10} END {for(t in type) print t, type[t], size[t]}' access.logError Handling & Debugging
Common pitfalls: unescaped special characters, mismatched field separators, memory exhaustion on huge files.
Debugging sed with -n 'p;l' to show processed lines.
Debugging awk by printing line numbers and field counts:
awk '{print NR, NF, $0}' file.txtFurther Resources
Books: sed & awk by Dale Dougherty; Effective awk Programming by Arnold Robbins.
Online tools: AWK playgrounds, regex testers.
Advanced topics: GNU awk extensions (time handling, networking), sed’s label and branch control, deep integration with shell scripts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
