Operations 11 min read

Master AWK for Log Analysis: A Quick Beginner’s Guide

This tutorial walks beginners through essential AWK commands and techniques for parsing and filtering log files, covering field extraction, separators, arithmetic on string fields, BEGIN/END blocks, conditional filters, external parameters, and common functions with practical examples.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master AWK for Log Analysis: A Quick Beginner’s Guide

Preface

These past two days I rolled up my sleeves to handle logs and finally got started with AWK. In fact, basic usage can be learned in half a day; previously I relied on colleagues, lazy.

This article is for beginners; operators, please do not watch.

Below is an example of the log being processed, not very standard, but non‑standard logs are the typical case.

[2015-08-20 10:00:55.600] - [192.168.0.73/192.168.0.75:1080 com.vip.xxx.MyService_2.0 0 106046 100346 90ms 110ms]

Basic Statements

The most basic statement splits fields by spaces and extracts needed columns:

awk '{print $0,$1,$2,$(NF-1),$NF,$NF-$(NF-1)}' access.log

1. Input

AWK processes each line of a file or pipe, so you can also read from a pipe:

grep "xxx" access.log | awk '{print $1}'

But writing it as a cat pipe is a classic Linux joke; AWK does not need cat:

cat access.log | awk '{print $1}'

2. Statement Definition

You can quickly write all statements on one line using single quotes. You can also use -f to specify a file, allowing line breaks for readability and reuse. All executable statements are enclosed in {}, with optional filters outside the braces.

3. Column References $0 represents the entire line, $1 the first column (finally not zero‑based). NF is the total number of fields, so $NF is the last column and $(NF-1) the second‑last. You can also perform arithmetic, e.g., $NF-$(NF-1) subtracts the last two columns.

Writing just print is shorthand for print $0, printing the whole line.

4. Input Field Separator

By default fields are split on spaces, but you can change it, for example to a colon:

awk -F ':' '{print $1,$2}' access.log

You can also define multiple separators with a regular expression, e.g., hyphen and colon:

awk -F '[-:]' '{print $1,$2}' access.log

5. Output Field Separator

In print $1,$2 the comma prints a space between columns. You can specify any character, for example a tab and a dash:

awk '{print $1 "\t" $2 " - " $3$4xxxxx$5}' access.log

You can also omit any separator, or use characters without quoting, which results in no separator.

Numeric and String Types

Even though the last two columns in the example contain the string "ms", AWK automatically converts them to numbers when you perform arithmetic, such as summing:

awk '{sum+=$NF} END {print sum, sum/NR}' access.log

To compare a string column numerically, first coerce it to a number, e.g.:

awk '$NF*1>100 {print}' access.log

or

awk 'int($NF)>100 {print}' access.log

BEGIN and END Statements

Statements after BEGIN run before any input is processed; statements after END run after all input has been processed.

1. Calculate Totals and Averages

awk '{sum+=$NF} END {print sum, sum/NR}'

2. Print Header

awk 'BEGIN{print "Date\t\tTime\t\tCost"} {print $1 "\t"$2 "\t" $NF}' access.log

Filtering Rows

1. Simple Pattern Matching

You can filter with grep first, or embed a regular expression directly in AWK:

awk '/192\.168\.0\.4[1-5]/ {print $1}' access.log

Equivalent to:

grep "192.168.0.4[1-5]" access.log | awk '{print $1}'

2. Column‑Specific Pattern Matching

Match the fourth column with ~ (or !~ for non‑match):

awk '$4 ~ /192\.168\.0\.4[1-5]/ {print}' access.log

3. Numeric Filtering

Supports ==, !=, <, >, <=, >=:

awk '$(NF-1)*1==100 {print}' access.log
awk '$NF-$(NF-1)>100 {print}' access.log

4. Multiple Conditions

awk '($12>150 || $(13)>250) {print}' access.log

5. Using if Statements

awk '{ if ($(NF-1)*1>100) print}' access.log

Other Features

1. External Parameters

You can pass a threshold value from the command line:

awk '{if($(NF)*1>threshold) print}' threshold=20 access.log

2. Common Functions

The most useful functions are gsub, sub, match, index, etc. gsub replaces a string globally, optionally limited to a specific column:

awk '{gsub("ms]","",$NF); if($NF>100) print}' access.log

Examples

1. Extract Data Within a Time Range

Extract logs between 17:30:30 and 17:31:00 by concatenating hour, minute, second columns into a number:

awk -F "[ :. ]" '$2$3$4>=173030 && $2$3$4<173100 {print}' access.log

Extract logs for a specific hour (e.g., 11 o'clock):

awk '/[2015-08-20 11:/ {print $1}' access.log

Extract logs from 11:01 to 11:05:

awk '/[2015-08-20 11:0[1-5]:/ {print $1}' access.log

2. Find When Timeouts Occurred

First filter timeout records, then strip milliseconds, group by second, and count occurrences:

awk '$(NF)*1>100 {print}' access.log | awk -F "." '{print $1}' | sort | uniq -c
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SysadminLog Processingawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.