Unlock Fast Log Analysis: 10 Essential awk Commands Every Sysadmin Should Know
This tutorial shows how awk, the built‑in Linux text‑processing tool, can dramatically speed up log analysis and routine data extraction by explaining core concepts, common patterns, real‑world examples such as Nginx, system monitoring and MySQL slow‑query logs, and providing a ready‑to‑run script.
1. awk Basics: Get Started in 3 Minutes
awk is a streaming text‑processing utility that reads input line by line, splits each line into fields using a delimiter, and optionally performs actions such as printing or aggregating.
Read by line – processes one line at a time without loading the whole file into memory.
Field separator – the -F option defines how a line is split into fields (e.g., space, colon).
Condition & action – a pattern (condition) selects lines, and an action (e.g., print) is executed on those lines.
1.1 Core syntax you must remember
awk [options] 'condition{action}' filenameOptions : most common is -F to set the field separator.
Condition : an expression that must be true for the action to run; omitted means all lines.
Action : commands such as print, arithmetic, or variable assignment.
1.2 First practice with /etc/passwd
Print all usernames (field 1): awk -F ':' '{print $1}' /etc/passwd Print only the root entry: awk -F ':' '$1=="root"{print $0}' /etc/passwd Print users whose UID ( $3) is less than 100:
awk -F ':' '$3<100{print $1, $3}' /etc/passwd1.3 Built‑in variables you should know
Some handy awk variables: $0 – the entire current line. $n – the nth field (e.g., $1, $2). NR – current record (line) number. NF – number of fields in the current line. FS – input field separator (equivalent to -F).
2. Real‑World Operations Scenarios
Scenario 1: Nginx access‑log analysis (most common)
Typical Nginx log line (space‑separated):
192.168.1.100 - - [07/Jan/2026:12:00:00 +0800] "GET /index.html HTTP/1.1" 200 1024Field mapping (default space delimiter): $1 – client IP $2 – placeholder $3 – placeholder $4 – timestamp $7 – request URL $9 – HTTP status code $10 – response size
Requirement 1: Top 10 IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10Requirement 2: List all 404 requests (IP + URL) awk '$9=="404"{print $1, $7}' access.log Requirement 3: Top 5 URLs by hit count
awk '{print $7}' access.log | sort | uniq -c | sort -nr | head -5Requirement 4: Show lines 100‑200 (useful for slicing large logs)
awk 'NR>=100 && NR<=200{print $0}' access.logScenario 2: System monitoring log
Sample log (space + equals as delimiter):
2026-01-07 12:00:00 host1 cpu=85 mem=60 disk=40
2026-01-07 12:01:00 host1 cpu=95 mem=70 disk=45
2026-01-07 12:02:00 host2 cpu=70 mem=50 disk=30Extract records where CPU > 90%:
awk -F '[ =]+' '$6>90{print $1, $2, $3, $6}' sys_monitor.logScenario 3: MySQL slow‑query log
Each entry contains a Query_time field. To list queries that took more than 5 seconds:
awk '/Query_time/ {time=$2; getline; if(time>5) print time, $0}' slow_query.log3. Advanced awk tricks
3.1 Custom output separator
Set OFS to a vertical bar for tidy CSV‑style output:
awk -F ':' 'BEGIN{OFS="|"}{print $1, $3, $7}' /etc/passwd3.2 Summation and average
Calculate average CPU usage from the monitoring log:
awk -F '[ =]+' '{sum+=$4; count++} END{print "Average CPU:", sum/count}' sys_monitor.log3.3 Combine with other commands
Count occurrences of each HTTP status code:
awk '{print $9}' access.log | sort | uniq -c4. One‑click Nginx log analysis script
Save the following as nginx_log_analysis.sh and make it executable ( chmod +x nginx_log_analysis.sh). Run it with the path to your access log to generate a concise report.
#!/bin/bash
# Nginx log analysis script
# Usage: ./nginx_log_analysis.sh /var/log/nginx/access.log
LOG_FILE=$1
if [ -z $LOG_FILE ]; then
echo "❌ Usage error! Provide the log file path"
echo "✅ Correct usage: ./nginx_log_analysis.sh /var/log/nginx/access.log"
exit 1
fi
if [ ! -f $LOG_FILE ]; then
echo "❌ Error: file $LOG_FILE does not exist!"
exit 1
fi
echo -e "
========== Nginx Log Analysis Report =========="
echo "📌 Top 10 IPs"
awk '{print $1}' $LOG_FILE | sort | uniq -c | sort -nr | head -10
echo -e "
📌 Top 20 404 requests"
awk '$9=="404"{print $1, $7}' $LOG_FILE | head -20
echo -e "
📌 Top 10 URLs"
awk '{print $7}' $LOG_FILE | sort | uniq -c | sort -nr | head -10
echo -e "
📌 Status code distribution"
awk '{print $9}' $LOG_FILE | sort | uniq -c | sort -nr
echo -e "
========== Analysis Complete =========="5. Common pitfalls and how to avoid them
Wrong field separator – using awk -F ' ' on multi‑space logs splits incorrectly. Use -F '[ ]+' or omit -F because awk collapses consecutive spaces by default.
Missing quotes in string comparisons – $1=="root" is required; writing $1=root treats root as a variable.
Performance on huge logs – filter first, then aggregate. Example: awk '$9=="404"{print $1}' access.log | sort | uniq -c is faster than processing the whole file repeatedly.
6. Conclusion
For sysadmins, mastering the four‑element pattern separator + field + condition + action lets awk handle about 80 % of everyday text‑processing tasks, from quick log triage to building fully automated reporting scripts.
Xiao Liu Lab
An operations lab passionate about server tinkering 🔬 Sharing automation scripts, high-availability architecture, alert optimization, and incident reviews. Using technology to reduce overtime and experience to avoid major pitfalls. Follow me for easier, more reliable operations!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
