Operations 15 min read

Essential Apache & Nginx Log Analysis Commands for Linux Ops

This guide compiles practical Linux shell commands for analyzing Apache and Nginx access logs, covering IP frequency, page request counts, status code distribution, traffic volume, crawler detection, subnet aggregation, and time‑based request rates to help administrators monitor web service health efficiently.

Raymond Ops
Raymond Ops
Raymond Ops
Essential Apache & Nginx Log Analysis Commands for Linux Ops

Apache Log Statistics

<code># List top IPs by request count for the current day
cut -d- -f 1 access_log | uniq -c | sort -rn | head -20

# Count unique IPs for the current day
awk '{print $1}' access_log | sort | uniq | wc -l

# Count total accesses of a specific page
cat access_log | grep "index.php" | wc -l

# Show how many pages each IP accessed
awk '{++S[$1]} END {for (a in S) print a,S[a]}' access_log

# Sort IPs by number of pages accessed (ascending)
awk '{++S[$1]} END {for (a in S) print S[a],a}' access_log | sort -n

# List pages accessed by a specific IP
grep "^192.168.1.2" access_log | awk '{print $1,$7}'

# Exclude search engine bots from page count
awk '{print $12,$1}' access_log | grep ^"Mozilla" | awk '{print $2}' | sort | uniq | wc -l

# Count IPs within a specific hour
awk '{print $4,$1}' access_log | grep "21/Nov/2019:03:40:26" | awk '{print $2}' | sort | uniq | wc -l</code>

Nginx Log Statistics

<code># List all unique IPs
awk '{print $1}' access_log | sort -n | uniq

# Top 100 most frequent IPs
awk '{print $1}' access_log | sort -n | uniq -c | sort -rn | head -n 100

# IPs with more than 100 requests
awk '{print $1}' access_log | sort -n | uniq -c | awk '{if ($1 > 100) print $0}' | sort -rn

# Detailed request list for a specific IP, sorted by frequency
grep '192.168.1.2' access_log | awk '{print $7}' | sort | uniq -c | sort -rn | head -n 100

# Top 100 most requested pages
awk '{print $7}' access_log | sort | uniq -c | sort -rn | head -n 100

# Top 100 pages excluding PHP and Python scripts
grep -E -v ".php|.py" access_log | awk '{print $7}' | sort | uniq -c | sort -rn | head -n 100

# Pages with more than 100 hits
cat access_log | cut -d ' ' -f 7 | sort | uniq -c | awk '{if ($1 > 100) print $0}'

# Highest‑traffic pages in the last 1000 lines
tail -1000 access_log | awk '{print $7}' | sort | uniq -c | sort -nr

# Top 100 one‑second intervals by request count
awk '{print $4}' access_log | cut -c14-21 | sort | uniq -c | sort -nr | head -n 100

# Top 100 one‑minute intervals by request count
awk '{print $4}' access_log | cut -c14-18 | sort | uniq -c | sort -nr | head -n 100

# Top 100 one‑hour intervals by request count
awk '{print $4}' access_log | cut -c14-15 | sort | uniq -c | sort -nr | head -n 100</code>

Web Service Status Statistics

<code># List crawlers (Googlebot, Baiduspider)
grep -E 'Googlebot|Baiduspider' access_log | awk '{print $1}' | sort | uniq

# Browser distribution (excluding common browsers)
cat access_log | grep -v -E 'MSIE|Firefox|Chrome|Opera|Safari|Gecko|Maxthon' | sort | uniq -c | sort -r -n | head -n 100

# Subnet aggregation
cat access_log | awk '{print $1}' | awk -F'.' '{print $1"."$2"."$3".0"}' | sort | uniq -c | sort -r -n | head -n 200

# Visitor domain count
cat access_log | awk '{print $2}' | sort | uniq -c | sort -rn

# HTTP status code distribution
cat access_log | awk '{print $9}' | sort | uniq -c | sort -rn

# URL request count
cat access_log | awk '{print $7}' | sort | uniq -c | sort -rn

# URL traffic (including query strings)
cat access_log | awk '{print $7}' | egrep '?|&' | sort | uniq -c | sort -rn

# File traffic (bytes transferred per URL)
cat access_log | awk '{sum[$7]+=$10} END {for(i in sum) print sum[i], i}' | sort -rn</code>

Combined Statistics and Count Examples

<code># Count page accesses
grep "/index.php" log_file | wc -l

# Pages per IP (sorted ascending)
awk '{++S[$1]} END {for (a in S) print S[a],a}' log_file | sort -n

# Pages accessed by a specific IP
grep ^111.111.111.111 log_file | awk '{print $1,$7}'

# Exclude search engine bots from page count
awk '{print $12,$1}' log_file | grep ^"Mozilla" | awk '{print $2}' | sort | uniq | wc -l

# IP count within a specific hour (e.g., 21/Jun/2018:14)
awk '{print $4,$1}' log_file | grep 21/Jun/2018:14 | awk '{print $2}' | sort | uniq | wc -l

# Identify slowest scripts
grep -v 0$ log_file | awk -F '" ' '{print $4" " $1}' | awk '{print $1" "$8}' | sort -n -k 1 -r | uniq > /tmp/slow_url.txt

# Extract IP and URL pairs in real time
tail -f log_file | grep '/test.html' | awk '{print $1" "$7}'</code>
operationsLinuxNginxshellApacheLog Analysis
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.