Master AWK for Log Analysis: A Quick Beginner’s Guide
This tutorial walks beginners through essential AWK commands and techniques for parsing and filtering log files, covering field extraction, separators, arithmetic on string fields, BEGIN/END blocks, conditional filters, external parameters, and common functions with practical examples.
Preface
These past two days I rolled up my sleeves to handle logs and finally got started with AWK. In fact, basic usage can be learned in half a day; previously I relied on colleagues, lazy.
This article is for beginners; operators, please do not watch.
Below is an example of the log being processed, not very standard, but non‑standard logs are the typical case.
[2015-08-20 10:00:55.600] - [192.168.0.73/192.168.0.75:1080 com.vip.xxx.MyService_2.0 0 106046 100346 90ms 110ms]
Basic Statements
The most basic statement splits fields by spaces and extracts needed columns:
awk '{print $0,$1,$2,$(NF-1),$NF,$NF-$(NF-1)}' access.log
1. Input
AWK processes each line of a file or pipe, so you can also read from a pipe:
grep "xxx" access.log | awk '{print $1}'
But writing it as a cat pipe is a classic Linux joke; AWK does not need cat:
cat access.log | awk '{print $1}'
2. Statement Definition
You can quickly write all statements on one line using single quotes. You can also use -f to specify a file, allowing line breaks for readability and reuse. All executable statements are enclosed in {}, with optional filters outside the braces.
3. Column References $0 represents the entire line, $1 the first column (finally not zero‑based). NF is the total number of fields, so $NF is the last column and $(NF-1) the second‑last. You can also perform arithmetic, e.g., $NF-$(NF-1) subtracts the last two columns.
Writing just print is shorthand for print $0, printing the whole line.
4. Input Field Separator
By default fields are split on spaces, but you can change it, for example to a colon:
awk -F ':' '{print $1,$2}' access.log
You can also define multiple separators with a regular expression, e.g., hyphen and colon:
awk -F '[-:]' '{print $1,$2}' access.log
5. Output Field Separator
In print $1,$2 the comma prints a space between columns. You can specify any character, for example a tab and a dash:
awk '{print $1 "\t" $2 " - " $3$4xxxxx$5}' access.log
You can also omit any separator, or use characters without quoting, which results in no separator.
Numeric and String Types
Even though the last two columns in the example contain the string "ms", AWK automatically converts them to numbers when you perform arithmetic, such as summing:
awk '{sum+=$NF} END {print sum, sum/NR}' access.log
To compare a string column numerically, first coerce it to a number, e.g.:
awk '$NF*1>100 {print}' access.log
or
awk 'int($NF)>100 {print}' access.log
BEGIN and END Statements
Statements after BEGIN run before any input is processed; statements after END run after all input has been processed.
1. Calculate Totals and Averages
awk '{sum+=$NF} END {print sum, sum/NR}'
2. Print Header
awk 'BEGIN{print "Date\t\tTime\t\tCost"} {print $1 "\t"$2 "\t" $NF}' access.log
Filtering Rows
1. Simple Pattern Matching
You can filter with grep first, or embed a regular expression directly in AWK:
awk '/192\.168\.0\.4[1-5]/ {print $1}' access.log
Equivalent to:
grep "192.168.0.4[1-5]" access.log | awk '{print $1}'
2. Column‑Specific Pattern Matching
Match the fourth column with ~ (or !~ for non‑match):
awk '$4 ~ /192\.168\.0\.4[1-5]/ {print}' access.log
3. Numeric Filtering
Supports ==, !=, <, >, <=, >=:
awk '$(NF-1)*1==100 {print}' access.log
awk '$NF-$(NF-1)>100 {print}' access.log
4. Multiple Conditions
awk '($12>150 || $(13)>250) {print}' access.log
5. Using if Statements
awk '{ if ($(NF-1)*1>100) print}' access.log
Other Features
1. External Parameters
You can pass a threshold value from the command line:
awk '{if($(NF)*1>threshold) print}' threshold=20 access.log
2. Common Functions
The most useful functions are gsub, sub, match, index, etc. gsub replaces a string globally, optionally limited to a specific column:
awk '{gsub("ms]","",$NF); if($NF>100) print}' access.log
Examples
1. Extract Data Within a Time Range
Extract logs between 17:30:30 and 17:31:00 by concatenating hour, minute, second columns into a number:
awk -F "[ :. ]" '$2$3$4>=173030 && $2$3$4<173100 {print}' access.log
Extract logs for a specific hour (e.g., 11 o'clock):
awk '/[2015-08-20 11:/ {print $1}' access.log
Extract logs from 11:01 to 11:05:
awk '/[2015-08-20 11:0[1-5]:/ {print $1}' access.log
2. Find When Timeouts Occurred
First filter timeout records, then strip milliseconds, group by second, and count occurrences:
awk '$(NF)*1>100 {print}' access.log | awk -F "." '{print $1}' | sort | uniq -c
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
