Operations 45 min read

Master Batch Text Processing with awk and sed: A Practical Guide for Sysadmins

This article walks through the fundamentals and advanced techniques of using awk and sed on Linux for batch text manipulation, covering field handling, custom delimiters, BEGIN/END blocks, conditional filtering, arrays, built‑in functions, real‑world Nginx log analysis, script creation, performance tips, common pitfalls, debugging tricks, and how to combine both tools for powerful pipelines.

Ops Community

May 31, 2026

Master Batch Text Processing with awk and sed: A Practical Guide for Sysadmins

Background and Applicable Scenarios

System administration frequently involves processing text files such as logs, configuration files, and CSV data. Manual line‑by‑line edits are time‑consuming and error‑prone. The GNU versions of awk and sed, which are bundled with most Linux distributions, provide a powerful way to automate the majority of batch text operations.

Extract specific fields (e.g., IP address, error code, response time) from large log files.

Perform bulk replacements across many servers (e.g., change a configuration parameter).

Convert formats, such as turning CSV into an HTML table.

Statistically analyse logs, for example counting requests per IP.

The article assumes a GNU environment; BSD/macOS versions have slight syntax differences.

awk Practical Guide

Understanding the Core Mechanics

awk

reads a file line by line, splitting each line into fields based on whitespace by default. The special variables are: $0 – the entire line. $1, $2, … – the first, second, etc., fields. $NF – the last field; NF holds the number of fields.

# Print the first and second columns of /etc/hosts
awk '{print $1, $2}' /etc/hosts

All later awk examples build on this rule.

Custom Field Separators

Use -F to specify a delimiter. Examples:

# Count calls per API in a CSV log (fields: time,api,status,time)
awk -F',' '{print $2}' access.log | sort | uniq -c | sort -rn

# Use a pipe as delimiter
awk -F'|' '{print $1, $3}' config.txt

# Accept either comma or semicolon
awk -F'[,;]' '{print $2}' mixed.txt

# Change output separator with OFS
awk -F':' 'OFS="->" {print $1, $NF}' /etc/passwd | head -5

The output field separator OFS defaults to a space.

BEGIN and END Blocks

BEGIN

runs before any input is read (useful for initializing variables or printing headers). END runs after all lines are processed (ideal for summarising results).

# Add a header and count failed passwords per IP
awk 'BEGIN {print "IP\tCount\tStatus"} /Failed password/ {ip[$11]++} END {for (i in ip) print i, ip[i]}' /var/log/auth.log | sort -k2 -rn

# Print total number of lines
awk 'END {print NR}' access.log

# Initialise accumulator and sum response times
awk 'BEGIN {sum=0} {sum+=$5} END {print "Total response time:", sum}' access.log

NR

is the current line number; FNR is the line number within the current file when processing multiple files.

Conditional Filtering

Place conditions inside //. Only lines that satisfy the condition are processed.

# Process only status code 500
awk '$9 == 500 {print $1, $7, $9}' access.log

# Requests with response time > 5 seconds (assuming $NF is response time)
awk '$NF > 5 && $NF ~ /^[0-9.]+$/ {print $1, $7, $NF}' access.log

# Combine conditions: status 500 and response time > 10 seconds
awk '$9 == 500 && $NF > 10 && $NF ~ /^[0-9.]+$/ {print $0}' access.log

# Regex match: URL contains /api/login
awk '/\/api\/login/ {print $1, $7}' access.log

# Field regex match: second field contains "error"
awk -F',' '$2 ~ /error/ {print $1, $3}' error.log

# Negate: second field does NOT contain "error"
awk -F',' '$2 !~ /error/ {print $1, $3}' error.log

Arrays and Loops

awk

associative arrays are extremely useful for counting by key without prior declaration.

# Count requests per IP
awk '{ip[$1]++} END {for (k in ip) print ip[k], k}' access.log | sort -rn | head -20

# Average response time per API
awk -F',' '{api[$2]++; total[$2]+=$4} END {for (k in api) {avg=total[k]/api[k]; printf "%s: calls %d, avg %.2fms
", k, api[k], avg}}' access.log

# Two‑dimensional array: daily calls per API
awk -F'[, ]' '{date=$4; api=$7; stats[date,api]++} END {for (key in stats) {split(key, parts, SUBSEP); print parts[1], parts[2], stats[key]}}' access.log

The default two‑dimensional key separator SUBSEP is \034.

Built‑in Functions

Common string and math functions:

# Upper‑/lower‑case conversion
awk '{print toupper($1), tolower($2)}' file.txt

# Substring extraction (first 10 characters)
awk '{print substr($1,1,10)}' file.txt

# Global substitution (similar to sed's s///g)
awk '{gsub(/error/, "ERROR"); print}' error.log

# Split a field into an array
awk '{n=split($0, parts, "/"); print "Fields:", n; for(i=1;i<=n;i++) print i, parts[i]}' <<< "a/b/c/d"

# Index of a substring
awk 'BEGIN {print index("hello world", "world")}'

# Regex match with position
awk 'BEGIN {if (match("error 1234 at line 50", /[0-9]+/)) print RSTART, RLENGTH}'

# Formatted output
awk '{printf "%s\t%.2f\t%06d
", $1, $2, $3}' data.txt

# Math functions
awk 'BEGIN {print sqrt(144); print int(3.7); print rand(); srand(); print sin(3.14159/2); print log(exp(1))}'

Real‑World Example: Extracting Key Metrics from Nginx Access Logs

The standard combined log format fields are:

$1 = $remote_addr   # client IP
$2 = $remote_user   # authenticated user
$3 = $time_local    # timestamp
$4 = $request       # full request line
$5 = $status        # HTTP status code
$6 = $body_bytes_sent
$7 = $http_referer
$8 = $http_user_agent

If the log includes $request_time, it appears as the last field ( $NF).

# 1. Requests per minute (peak traffic)
awk '{print $4}' access.log | \
    sed 's/\[//;s/\+0800//' | \
    awk '{print substr($1,1,16)}' | \
    sort | uniq -c | sort -rn | head -20

# 2. HTTP status distribution
awk '{status[$9]++} END {for (s in status) printf "Status %s: %d (%.1f%%)
", s, status[s], status[s]*100/NR}' access.log | sort -rn

# 3. Slow requests (>3 s)
awk 'NF>=9 && $NF ~ /^[0-9.]+$/ && $NF>3 {time=substr($4,2,21); url=$7; dur=$NF; ip=$1; printf "[%s] %s %s response %.1fs
", time, ip, url, dur}' access.log | sort -t'[' -k2 -rn | head -30

# 4. Per‑IP QPS and bandwidth
awk '{ip[$1]++; bytes[$1]+=$10} END {for (i in ip) printf "%s: %d requests, %.2f MB
", i, ip[i], bytes[i]/1024/1024}' access.log | sort -k2 -rn | head -20

# 5. Calls per endpoint and average response time
awk '{url=$7; sub(/\?.*/,"",url); count[url]++; if ($NF ~ /^[0-9.]+$/) resp[url]+=$NF} END {printf "%-40s %10s %12s
", "Endpoint", "Calls", "Avg ms"; for (u in count) {avg=(u in resp)?resp[u]/count[u]:0; printf "%-40s %10d %12.2f
", u, count[u], avg}}' access.log | sort -k3 -rn | head -20

Replace access.log with the actual log path when using these commands.

awk Script Files

For complex logic, store the program in a file to simplify debugging and reuse.

#!/usr/bin/awk -f
# File: analyze_nginx.awk
# Usage: awk -f analyze_nginx.awk access.log
BEGIN {FS = "[ ]+"; print "Starting analysis..."}
$9 ~ /^[0-9]{3}$/ {ip[$1]++; status[$9]++; url=$7; sub(/\?.*/,"",url); page[url]++; if ($NF ~ /^[0-9.]+$/) resp_sum[url]+=$NF; bytes[$1]+=$10}
END {
    print "
========== Access Statistics =========="
    print "Total requests:", NR
    print "
=== Top 10 IPs ==="
    for (i in ip) ranking[ip[i]]=i
    count=0
    for (p=999999; p>=0 && count<10; p--) if (ranking[p]) {print ranking[p], "=>", p, "times"; count++}
    print "
=== Status Distribution ==="
    for (s in status) {pct=status[s]*100/NR; printf "  %s: %d (%.1f%%)
", s, status[s], pct}
    print "
=== Top 10 Slow Endpoints (Avg ms) ==="
    for (u in page) if (u in resp_sum && resp_sum[u]>0) avg=resp_sum[u]/page[u]; avg_ranking[avg"-"u]=u
    count=0
    for (p=999999; p>=0 && count<10; p--) for (key in avg_ranking) {split(key, parts, "-"); if (parts[1]==p) {printf "  %s: %.2fms (%d calls)
", avg_ranking[key], p, page[avg_ranking[key]]; count++}}
}'

Make the script executable ( chmod +x analyze_nginx.awk) and run it with the log file.

Performance Optimisation Tips

When processing large files, consider:

Initialising variables in a BEGIN block to avoid per‑line overhead.

Using next to skip irrelevant lines early.

Avoiding repeated string concatenation inside loops; use arrays instead.

Splitting massive files into chunks and processing them in parallel (e.g., split -l 100000 + parallel).

# Initialise in BEGIN
awk 'BEGIN {FS=","} $3>100 {count++} END {print count}' large.csv

# Skip header line
awk 'NR==1 {next} $3>100 {count++} END {print count}' large.csv

# Use array instead of repeated concatenation
awk '{arr[NR]=$1}' bigfile

# Parallel processing example
split -l 100000 big.log part_
for f in part_*; do awk '...' "$f"; done | awk '...'

sed Practical Guide

How It Works

sed

is a stream editor that reads one line into the pattern space, applies commands, then outputs the line. Core concepts:

Pattern space : the current line being processed.

Hold space : a secondary buffer for multi‑line operations.

Address : selects which lines to operate on (line numbers, regex, or conditions).

# Basic syntax
sed [options] 'command' file

# Common options
-n   # suppress automatic printing (use with p)
-i   # edit file in‑place (dangerous, use -i.bak for backup)
-e   # execute multiple commands
-f   # read commands from a file
-r   # enable extended regular expressions

Substitution Command s

Syntax: s/pattern/replacement/flags.

# Replace first occurrence of "error" on each line
sed 's/error/ERROR/' error.log

# Global replacement
sed 's/error/ERROR/g' error.log

# Case‑insensitive replacement (requires -i)
sed 's/error/ERROR/gi' error.log

# Replace only on line 3
sed '3s/error/ERROR/' error.log

# Replace lines 3‑7
sed '3,7s/error/ERROR/g' error.log

# Replace only lines matching "error"
sed '/error/s/error/ERROR/g' error.log

# Delete matching lines
sed '/error/d' error.log

# Preview replacement without modifying file
sed -n 's/old/new/p' file.txt

# Backup before in‑place edit
sed -i.bak 's/old/new/g' file.txt

# Use alternative delimiter to avoid escaping slashes
sed 's#/etc/nginx#/opt/nginx#g' config.conf

# Extended regex: collapse multiple spaces
sed -r 's/ +/ /g' file.txt

# Capture group example: mask IP address
sed -E 's/(192\.168\.1\.)[0-9]+/\1XXX/' config.txt

# Multiple replacements in one command
sed -e 's/old1/new1/g' -e 's/old2/new2/g' file.txt

# Apply commands from a file
sed -f replace.txt error.log

Addresses and Ranges

Specify which lines to act on:

# Single line
sed '5s/old/new/' file.txt

# Range of lines
sed '1,10s/old/new/g' file.txt

# From line 5 to end
sed '5,$s/old/new/g' file.txt

# Regex‑matched lines
sed '/error/s/old/new/g' file.txt

# From "start" to "end"
sed '/start/,/end/s/old/new/g' file.txt

# Inverse address (exclamation mark)
sed '5!d' file.txt   # keep only line 5
sed '/error/!s/old/new/g' file.txt   # replace only non‑error lines

# Step address: every 5th line
sed '1~5s/old/new/g' file.txt

Delete Command d

Deletion is risky; preview with -n and p first.

# Delete empty lines
sed '/^$/d' file.txt

# Delete lines containing only whitespace
sed '/^[[:space:]]*$/d' file.txt

# Delete comment lines starting with '#'
sed '/^#/d' config.conf

# Trim leading spaces/tabs
sed 's/^[ \t]*//' file.txt

# Trim trailing spaces/tabs
sed 's/[ \t]*$//' file.txt

# Delete a specific line
sed '1d' file.txt

# Delete the last line
sed '$d' file.txt

# Delete a range of lines
sed '1,10d' file.txt

# Delete a line and the next two lines after a match
sed '/error/,+2d' file.txt

Insert, Append, and Change

# Insert before line 10
sed '10i
ew line content' file.txt

# Append after line 10
sed '10a
ew line content' file.txt

# Insert before a pattern
sed '/pattern/i
ew line before pattern' file.txt

# Append after a pattern
sed '/pattern/a
ew line after pattern' file.txt

# Replace entire line
sed '10c
ew entire line content' file.txt

# Insert at file start
sed '1i\Header line' file.txt

# Append at file end
sed '$a\Footer line' file.txt

# Insert a blank line after each line
sed 'G' file.txt

# Insert a blank line only after matching lines
sed '/pattern/G' file.txt

Multi‑Line Processing

By default sed works line‑by‑line; the N command can join the next line to the pattern space.

# Delete from "error start" to "error end" inclusive
sed '/error start/,/error end/N; /error end/d' file.txt

# Collapse consecutive empty lines into a single line
sed '/^$/N;/^
$/d' file.txt

# Convert Unix line endings to Windows
sed 's/$/\r/' unix.txt > windows.txt

# Convert Windows line endings to Unix
sed -i 's/\r$//' file.txt

File I/O and Pipelines

# Insert file content after line 5
sed '5r /etc/hosts' file.txt

# Write lines 5‑10 to a new file
sed '5,10w /tmp/extracted.txt' file.txt

# Write a section delimited by markers to a file
sed '/START/,/END/w /tmp/section.txt' file.txt

# Pipe input through sed
cat file.txt | sed 's/old/new/g'

# Process multiple files and redirect output
sed 's/old/new/g' file1.txt file2.txt > output.txt

# In‑place edit of multiple files
sed -i 's/old/new/g' file1.txt file2.txt file3.txt

Real‑World Example: Bulk Configuration Modification

Common sysadmin task: modify Nginx configuration across many servers.

# Change all "listen 80;" to "listen 8080;"
sed -i 's/listen\s*80;/listen 8080;/g' /etc/nginx/conf.d/*.conf

# Replace old domain with new domain
sed -i 's/server_name\s*old-domain.com;/server_name new-domain.com;/g' /etc/nginx/conf.d/*.conf

# Add a custom header at the beginning of each server block
sed -i '/server {/a\    add_header X-Server "nginx-1.24" always;' /etc/nginx/nginx.conf

# Delete all empty lines
sed -i '/^$/d' /etc/nginx/nginx.conf

# Append a timeout directive after line 25
sed -i '25a\proxy_read_timeout 300;' /etc/nginx/nginx.conf

Similar patterns apply to MySQL, application property files, Docker configuration, etc.

sed Script Files

For complex replacements, store commands in a script file.

#!/bin/bash
# batch_modify.sh – bulk Nginx config changes
OLD_DOMAIN="old.example.com"
NEW_DOMAIN="new.example.com"
CONFIG_DIR="/etc/nginx/conf.d"
BACKUP_DIR="/tmp/nginx_backup_$(date +%Y%m%d%H%M%S)"

mkdir -p "$BACKUP_DIR"
for conf in "$CONFIG_DIR"/*.conf; do
    [ -f "$conf" ] && cp "$conf" "$BACKUP_DIR/"
    echo "Backed up: $conf"
done

# Domain replacement
sed -i "s/server_name\s*$OLD_DOMAIN;/server_name $NEW_DOMAIN;/g" "$CONFIG_DIR"/*.conf

# Add security headers
sed -i '/server {/a\    add_header X-Content-Type-Options "nosniff" always;' "$CONFIG_DIR"/*.conf
sed -i '/server {/a\    add_header X-Frame-Options "SAMEORIGIN" always;' "$CONFIG_DIR"/*.conf
sed -i '/server {/a\    add_header X-XSS-Protection "1; mode=block" always;' "$CONFIG_DIR"/*.conf

# Verify syntax
nginx -t && echo "Configuration updated successfully" || { echo "Nginx test failed, restoring backup"; cp "$BACKUP_DIR"/*.conf "$CONFIG_DIR"; exit 1; }

Combining awk and sed

Using both tools in a pipeline enables sophisticated processing.

# Pre‑process with sed, then analyse with awk
sed 's/错误/error/g; s/警告/warning/g; s/成功/success/g' app.log | \
    awk '/error/ {count++} END {print "Total errors:", count}'

# Extract key‑value pairs with awk, format as a table with sed
awk -F'=' '{print $1, $2}' config.txt | \
    sed 's/^/| /; s/$/ |/; s/  */ | /g'

# Use sed to normalise delimiters, then awk for statistics
sed 's/|/,/g' app.log | \
    awk -F',' '{status[$3]++; module[$4]++} END {print "=== By Status ==="; for (s in status) print s, status[s]; print "=== By Module ==="; for (m in module) print m, module[m]}'

# Extract URLs from HTML with awk
awk -F'href="' '{n=split($0, parts, "href=\""); for(i=2;i<=n;i++){split(parts[i], url, "\""); print url[1]}}' page.html | grep -v '^$'

# Flexible field handling: print last and second‑last fields
awk '{print $NF; n=NF; print $(n-1)}' file.txt

# Join two files (like SQL JOIN) using awk
awk -F',' 'NR==FNR {user[$1]=$2; next} $2 in user {print user[$2], $1, $3}' users.txt orders.txt

Common Pitfalls and Troubleshooting

awk and sed Traps

# Trap 1: -i modifies files directly – always backup first
sed -i.bak 's/old/new/g' file.txt

# Trap 2: Escape special characters in regex
sed 's/192\.168\.1\.1/192.168.1.100/g' file.txt

# Trap 3: Variables inside single quotes are not expanded
NEW_VALUE="new"
sed "s/old/$NEW_VALUE/g" file.txt
# Correct: use double quotes or -v for sed, -v for awk

# Trap 4: $0 in awk is not a shell variable
VAR=100
awk -v var=$VAR '$1 == var' file.txt

# Trap 5: awk may load the whole file into memory – split large files
split -l 100000 bigfile chunk_
for f in chunk_*; do awk '...' "$f"; done > result.txt

# Trap 6: Hidden spaces/tabs – use cat -A to visualise
cat -A file.txt

# Trap 7: Locale issues with Unicode – set LC_ALL=C for pure byte processing
LC_ALL=C awk '{print $1}' chinese.txt

# Trap 8: Multi‑line records – adjust RS (record separator)
awk 'BEGIN{RS=""; FS="
"} {for(i=1;i<=NF;i++) print $i}' app.log

Debugging Techniques

# View intermediate sed output
sed 's/old/new/g' file.txt | head

# Simulate changes without writing back
sed 's/old/new/g' file.txt > new_file.txt

# Print debugging info in awk
awk '{print "Processing line:", NR; print "First field:", $1; print "Last field:", $NF}' file.txt

# Send debug messages to stderr
awk '{if (DEBUG) print "DEBUG:", $0 > "/dev/stderr"}' DEBUG=1 file.txt

# Watch a log file and process new lines automatically (requires inotify-tools)
inotifywait -m -e modify /var/log/app.log | \
    while read; do awk '...' /var/log/app.log; done

Comprehensive Real‑World Script: Nginx Log Analyzer

A complete bash script that combines awk and sed to generate a detailed report.

#!/bin/bash
# nginx_log_analyzer.sh – generate Nginx access log report
set -e
LOG_FILE="${1:-/var/log/nginx/access.log}"
REPORT_FILE="${2:-nginx_report_$(date +%Y%m%d_%H%M%S).txt}"

# Validate input file
if [ ! -f "$LOG_FILE" ]; then echo "Error: $LOG_FILE does not exist"; exit 1; fi
if [ ! -s "$LOG_FILE" ]; then echo "Warning: $LOG_FILE is empty"; exit 0; fi

# Header
echo "========================================" | tee "$REPORT_FILE"
echo "Nginx Access Log Analysis Report" | tee -a "$REPORT_FILE"
echo "Log file: $LOG_FILE" | tee -a "$REPORT_FILE"
echo "Generated at: $(date '+%Y-%m-%d %H:%M:%S')" | tee -a "$REPORT_FILE"
echo "========================================" | tee -a "$REPORT_FILE"

# Total requests
TOTAL_REQ=$(awk 'END {print NR}' "$LOG_FILE")
echo "[Total Requests] $TOTAL_REQ" | tee -a "$REPORT_FILE"

# HTTP status distribution
echo "[HTTP Status Distribution]" | tee -a "$REPORT_FILE"
awk '{status[$9]++} END {for (s in status) {pct=status[s]*100/NR; printf "  %s: %d (%.2f%%)
", s, status[s], pct}}' "$LOG_FILE" | sort | tee -a "$REPORT_FILE"

# Top 20 IPs
echo "[Top 20 IPs]" | tee -a "$REPORT_FILE"
awk '{ip[$1]++} END {for (i in ip) print ip[i], i | "sort -rn | head -20"}' "$LOG_FILE" | while read count ip; do printf "  %s: %d requests
" "$ip" "$count"; done | tee -a "$REPORT_FILE"

# Top 20 slow requests (>1 s)
echo "[Top 20 Slow Requests (>1 s)]" | tee -a "$REPORT_FILE"
awk 'NF>=9 && $NF ~ /^[0-9.]+$/ && $NF>1 {printf "  %s %s response %.2fs
", $4, $7, $NF}' "$LOG_FILE" | sort -k3 -rn | head -20 | tee -a "$REPORT_FILE"

# Top 20 endpoints by call count and average response time
echo "[Top 20 Endpoints (by calls)]" | tee -a "$REPORT_FILE"
awk '{url=$7; sub(/\?.*/,"",url); count[url]++; if ($NF ~ /^[0-9.]+$/) resp[url]+=$NF} END {for (u in count) {avg=(u in resp)?resp[u]/count[u]:0; print count[u], avg, u | "sort -rn | head -20"}}' "$LOG_FILE" | while read cnt avg url; do printf "  %d: %s (avg %.2f ms)
" "$cnt" "$url" "$avg"; done | tee -a "$REPORT_FILE"

# Bandwidth consumption (field $10 is body_bytes_sent)
echo "[Bandwidth Consumption]" | tee -a "$REPORT_FILE"
awk '{bytes+=$10} END {mb=bytes/1024/1024; gb=mb/1024; printf "  Total traffic: %.2f MB (%.4f GB)
", mb, gb}' "$LOG_FILE" | tee -a "$REPORT_FILE"

# Hourly request distribution
echo "[Hourly Request Distribution]" | tee -a "$REPORT_FILE"
awk '{hour=substr($4,13,2); hourly[hour]++} END {for (h=0; h<=23; h++) {hh=sprintf("%02d",h); printf "  %s:00‑%s:59: %d requests
", hh, hh, hourly[h]+0}}' "$LOG_FILE" | tee -a "$REPORT_FILE"

# Top 10 User‑Agent statistics
echo "[Top 10 User‑Agents]" | tee -a "$REPORT_FILE"
awk -F'"' '{ua=$6; if (ua!="") {sub(/^ */,"",ua); count[ua]++}} END {for (u in count) print count[u], u | "sort -rn | head -10"}' "$LOG_FILE" | while read cnt ua; do printf "  %d: %s
" "$cnt" "$ua"; done | tee -a "$REPORT_FILE"

# Error request statistics
echo "[Error Request Statistics]" | tee -a "$REPORT_FILE"
awk '{if ($9>=500) e5xx++; if ($9==404) e404++; if ($9==403) e403++; if ($9==400) e400++} END {printf "  5xx errors: %d
", e5xx+0; printf "  404 errors: %d
", e404+0; printf "  403 forbidden: %d
", e403+0; printf "  400 errors: %d
", e400+0}' "$LOG_FILE" | tee -a "$REPORT_FILE"

# Footer
echo "========================================" | tee -a "$REPORT_FILE"
echo "Report generated: $REPORT_FILE" | tee -a "$REPORT_FILE"
echo "========================================" | tee -a "$REPORT_FILE"

Performance Comparison and Tool Selection

awk vs sed vs grep vs cut

Field extraction : awk, cut – awk offers richer functionality.

Simple substitution : sed – most concise syntax.

Complex statistics : awk – powerful arrays and functions.

Line filtering : grep, sed – grep is straightforward; sed can modify files directly.

Formatted output : awk – strong printf capabilities.

Conditional processing : awk – natural condition syntax.

Cross‑line handling : awk – native RS support.

Large file handling : awk, grep – lower memory footprint than sed.

Performance Benchmarks (100 MB log file)

# Field extraction (cut vs awk)
time cut -d' ' -f1 access.log | head -100000 > /dev/null   # ~0.5 s
time awk '{print $1}' access.log | head -100000 > /dev/null   # ~0.8 s

# Global substitution
time sed 's/error/ERROR/g' access.log > /dev/null            # ~1.2 s
time awk '{gsub(/error/,"ERROR"); print}' access.log > /dev/null  # ~2.5 s

# Count unique IPs
time awk '{print $1}' access.log | sort | uniq -c | sort -rn > /dev/null   # ~8 s

Conclusion

awk

and sed are the Swiss‑army knives for sysadmins handling text. Use awk for structured data analysis, aggregations, and formatted reports; use sed for straightforward line‑oriented edits and bulk configuration changes. In practice, combine them in pipelines ( grep | awk | sed) to leverage each tool’s strengths.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux sysadmin Log Analysis text processing awk sed

Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background and Applicable Scenarios

awk Practical Guide

Understanding the Core Mechanics

Custom Field Separators

BEGIN and END Blocks

Conditional Filtering

Arrays and Loops

Built‑in Functions

Real‑World Example: Extracting Key Metrics from Nginx Access Logs

awk Script Files

Performance Optimisation Tips

sed Practical Guide

How It Works

Substitution Command s

Addresses and Ranges

Delete Command d

Insert, Append, and Change

Multi‑Line Processing

File I/O and Pipelines

Real‑World Example: Bulk Configuration Modification

sed Script Files

Combining awk and sed

Common Pitfalls and Troubleshooting

awk and sed Traps

Debugging Techniques

Comprehensive Real‑World Script: Nginx Log Analyzer

Performance Comparison and Tool Selection

awk vs sed vs grep vs cut

Performance Benchmarks (100 MB log file)

Conclusion

Ops Community

How this landed with the community

Was this worth your time?

0 Comments

Performance Benchmarks (100 MB log file)