Operations 17 min read

Unlock Fast Log Analysis: 10 Essential awk Commands Every Sysadmin Should Know

This tutorial shows how awk, the built‑in Linux text‑processing tool, can dramatically speed up log analysis and routine data extraction by explaining core concepts, common patterns, real‑world examples such as Nginx, system monitoring and MySQL slow‑query logs, and providing a ready‑to‑run script.

Xiao Liu Lab
Xiao Liu Lab
Xiao Liu Lab
Unlock Fast Log Analysis: 10 Essential awk Commands Every Sysadmin Should Know

1. awk Basics: Get Started in 3 Minutes

awk is a streaming text‑processing utility that reads input line by line, splits each line into fields using a delimiter, and optionally performs actions such as printing or aggregating.

Read by line – processes one line at a time without loading the whole file into memory.

Field separator – the -F option defines how a line is split into fields (e.g., space, colon).

Condition & action – a pattern (condition) selects lines, and an action (e.g., print) is executed on those lines.

1.1 Core syntax you must remember

awk [options] 'condition{action}' filename

Options : most common is -F to set the field separator.

Condition : an expression that must be true for the action to run; omitted means all lines.

Action : commands such as print, arithmetic, or variable assignment.

1.2 First practice with /etc/passwd

Print all usernames (field 1): awk -F ':' '{print $1}' /etc/passwd Print only the root entry: awk -F ':' '$1=="root"{print $0}' /etc/passwd Print users whose UID ( $3) is less than 100:

awk -F ':' '$3<100{print $1, $3}' /etc/passwd

1.3 Built‑in variables you should know

Some handy awk variables: $0 – the entire current line. $n – the nth field (e.g., $1, $2). NR – current record (line) number. NF – number of fields in the current line. FS – input field separator (equivalent to -F).

2. Real‑World Operations Scenarios

Scenario 1: Nginx access‑log analysis (most common)

Typical Nginx log line (space‑separated):

192.168.1.100 - - [07/Jan/2026:12:00:00 +0800] "GET /index.html HTTP/1.1" 200 1024

Field mapping (default space delimiter): $1 – client IP $2 – placeholder $3 – placeholder $4 – timestamp $7 – request URL $9 – HTTP status code $10 – response size

Requirement 1: Top 10 IPs by request count

awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

Requirement 2: List all 404 requests (IP + URL) awk '$9=="404"{print $1, $7}' access.log Requirement 3: Top 5 URLs by hit count

awk '{print $7}' access.log | sort | uniq -c | sort -nr | head -5

Requirement 4: Show lines 100‑200 (useful for slicing large logs)

awk 'NR>=100 && NR<=200{print $0}' access.log

Scenario 2: System monitoring log

Sample log (space + equals as delimiter):

2026-01-07 12:00:00 host1 cpu=85 mem=60 disk=40
2026-01-07 12:01:00 host1 cpu=95 mem=70 disk=45
2026-01-07 12:02:00 host2 cpu=70 mem=50 disk=30

Extract records where CPU > 90%:

awk -F '[ =]+' '$6>90{print $1, $2, $3, $6}' sys_monitor.log

Scenario 3: MySQL slow‑query log

Each entry contains a Query_time field. To list queries that took more than 5 seconds:

awk '/Query_time/ {time=$2; getline; if(time>5) print time, $0}' slow_query.log

3. Advanced awk tricks

3.1 Custom output separator

Set OFS to a vertical bar for tidy CSV‑style output:

awk -F ':' 'BEGIN{OFS="|"}{print $1, $3, $7}' /etc/passwd

3.2 Summation and average

Calculate average CPU usage from the monitoring log:

awk -F '[ =]+' '{sum+=$4; count++} END{print "Average CPU:", sum/count}' sys_monitor.log

3.3 Combine with other commands

Count occurrences of each HTTP status code:

awk '{print $9}' access.log | sort | uniq -c

4. One‑click Nginx log analysis script

Save the following as nginx_log_analysis.sh and make it executable ( chmod +x nginx_log_analysis.sh). Run it with the path to your access log to generate a concise report.

#!/bin/bash
# Nginx log analysis script
# Usage: ./nginx_log_analysis.sh /var/log/nginx/access.log

LOG_FILE=$1
if [ -z $LOG_FILE ]; then
  echo "❌ Usage error! Provide the log file path"
  echo "✅ Correct usage: ./nginx_log_analysis.sh /var/log/nginx/access.log"
  exit 1
fi
if [ ! -f $LOG_FILE ]; then
  echo "❌ Error: file $LOG_FILE does not exist!"
  exit 1
fi

echo -e "
========== Nginx Log Analysis Report =========="

echo "📌 Top 10 IPs"
awk '{print $1}' $LOG_FILE | sort | uniq -c | sort -nr | head -10

echo -e "
📌 Top 20 404 requests"
awk '$9=="404"{print $1, $7}' $LOG_FILE | head -20

echo -e "
📌 Top 10 URLs"
awk '{print $7}' $LOG_FILE | sort | uniq -c | sort -nr | head -10

echo -e "
📌 Status code distribution"
awk '{print $9}' $LOG_FILE | sort | uniq -c | sort -nr

echo -e "
========== Analysis Complete =========="

5. Common pitfalls and how to avoid them

Wrong field separator – using awk -F ' ' on multi‑space logs splits incorrectly. Use -F '[ ]+' or omit -F because awk collapses consecutive spaces by default.

Missing quotes in string comparisons – $1=="root" is required; writing $1=root treats root as a variable.

Performance on huge logs – filter first, then aggregate. Example: awk '$9=="404"{print $1}' access.log | sort | uniq -c is faster than processing the whole file repeatedly.

6. Conclusion

For sysadmins, mastering the four‑element pattern separator + field + condition + action lets awk handle about 80 % of everyday text‑processing tasks, from quick log triage to building fully automated reporting scripts.

Linuxlog analysistext processingshell scriptingawk
Xiao Liu Lab
Written by

Xiao Liu Lab

An operations lab passionate about server tinkering 🔬 Sharing automation scripts, high-availability architecture, alert optimization, and incident reviews. Using technology to reduce overtime and experience to avoid major pitfalls. Follow me for easier, more reliable operations!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.