Mastering Awk: Powerful Text Processing for Linux with Real‑World Examples
This tutorial introduces awk as a versatile Linux text‑analysis tool, explains its execution model (BEGIN, body, END), demonstrates practical commands for reporting, filtering, formatting, and advanced scripting, and provides numerous code snippets and visual examples to help readers quickly apply awk in real‑world scenarios.
What Awk Can Do
Awk is a powerful text‑analysis utility that excels at generating reports, parsing system logs, counting data such as website visits, aggregating system information, and supporting loops, conditionals, and arrays for complex data processing.
Awk Execution Model
Awk processes input in three stages:
BEGIN– runs commands before any input is read.
body– executed for each line (record) of input.
END– runs after all input has been processed.
Each line is split into fields (columns) using a delimiter (default whitespace). The record separator is
\n.
Basic Command Syntax
The basic awk command format is illustrated below:
Awk scripts are enclosed in single quotes.
$1..$Nrefers to specific columns, while
$0represents the entire line.
Practical – Beginner
Save sample data to
file.txtand run a simple awk command to print columns 1, 4, and 8:
Fields are accessed with
$1,
$4, etc. Awk’s
printfsupports C‑style formatting (e.g.,
%sfor strings,
-4for left‑aligned width 4).
Practical – Intermediate
Filtering Records – output only lines where column 3 equals
rootand column 6 equals
10:
Awk supports comparison operators
!=,
>,
<,
>=,
<=.
$0denotes the whole line.
Built‑in Variables –
NR(current record number) and
NF(field count) are useful for tracking line numbers and column counts.
Specifying Delimiters – change the input field separator with
FSor the
-Foption, and set the output field separator with
OFS:
Practical – Advanced
Conditional Matching – list all files owned by
rootor match lines containing
rootusing regular expressions (
/root/) or multiple patterns (
/Aug|Dec/).
Splitting Files – redirect output to separate files based on a field (e.g., month in column 5) using the
>operator.
If Statements – complex conditions are placed inside braces; remember
ifmust be inside the
{}block.
Statistics – sum file sizes of
*.cand
*.hfiles, or compute per‑user memory usage from the RSS column using arrays and
forloops.
Comprehensive Example – Student Grades
A full‑featured awk script (
cal.awk) processes a grade file, using
BEGINto print headers,
bodyto accumulate scores, and
ENDto output totals and averages.
Key Built‑in Variables
NR: current line number.
NF: number of fields in the current line.
RS: record separator (default newline).
FS: field separator (default space/tab).
OFS: output field separator (default space).
ORS: output record separator (default newline).
Formatting Output
Use
printfwith familiar C format specifiers (
%d,
%u,
%f,
%s,
%c,
%e,
%x,
%g,
\n,
\t).
Programming Constructs
Conditional statements (if/else).
Loops (while, for).
Arrays (associative, similar to maps).
Functions (built‑in and user‑defined).
Common String Functions
index(s, t): position of substring
tin
s.
length(s): length of
s.
split(s, a, sep): split
sinto array
ausing
sep.
substr(s, p, n): substring of
sstarting at
pwith length
n.
tolower(s)/
toupper(s): case conversion.
This guide provides a concise yet comprehensive overview of awk’s core concepts, syntax, and practical usage, enabling readers to harness awk for efficient text processing and data analysis on Linux.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.