Master awk: From Basics to Advanced Text Processing on Linux
This guide explains what awk is, its pattern‑action syntax, built‑in variables such as NR, NF, FS, RS, OFS and ORS, and provides step‑by‑step examples—including generating test data, column manipulation, custom delimiters, and solving a word‑frequency interview question—so readers can efficiently extract and transform text on the command line.
What is awk
awk is a powerful Linux command‑line utility for pattern‑action text processing, originally implemented on Unix. The GNU version (gawk) is the most commonly used implementation.
How to learn awk
Start by experimenting with simple one‑liner commands, then explore patterns, actions, and built‑in variables to handle more complex data extraction tasks.
Syntax format
awk programs consist of pattern, action or just an action. A pattern (similar to sed’s regular‑expression match) selects which lines to process, while an action is a list of statements inside braces.
Pattern and action examples
Pattern selects which lines to process, e.g. NR==1 selects the first line.
Action performs operations, e.g. {print $0} prints the whole line.
Generating test data
echo cc{01..50} | xargs -n 5 > yuchao.logNo pattern, only action
When no pattern is given, awk processes every input line. Common actions include printing fields:
awk '{print $0}' test_awk.log # print whole line
awk '{print $1}' test_awk.log # first column
awk '{print $2}' test_awk.log # second column
awk '{print $1,$3}' test_awk.log # first and third columnsRow variables (NR) and range syntax
NR is the record number (line number). It can be used to select specific lines:
awk 'NR==2{print $0}' test_awk.log # second line
awk 'NR>=2 && NR<=5{print $0}' test_awk.log # lines 2‑5Column variables (NF) and field references
NF holds the number of fields in the current record. $1, $2 … $NF refer to individual columns; $0 is the whole line.
awk '{print $0,NF}' test_awk.log # show line and field count
awk '{print $NF}' test_awk.log # last columnOther built‑in variables
FS – input field separator (default space/tab)
RS – input record separator (default newline)
OFS – output field separator (default space)
ORS – output record separator (default newline)
FILENAME – current file name
Modifying separators
Changing RS or FS lets awk treat different characters as line or column delimiters. Example: treat space as a record separator to make each word a separate line.
awk -v RS=' ' '{print $0}' english.logInterview question – word frequency
Count the most frequent words in a text and show the top five.
# Using sed, sort, uniq
sed -r 's#[^a-zA-Z]+#
#g' english.log | sort | uniq -c | sort -r -n | head -5
# Using tr
cat english.log | tr ' ' '
' | sort | uniq -c | sort -r -n | head -5
# Using awk
awk -v RS='[^a-zA-Z]+' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5Column operations and custom delimiters
FS can be set to a character or regular expression, e.g. to split /etc/passwd on ‘:’ and print username and the last field: awk -v FS=':' '{print $1,$NF}' /etc/passwd OFS can be changed to format output, for example replace spaces with ‘#’:
awk -v OFS='#' '{print $1,$2,$3,$4,$5}' test_awk.logSummary
Awk’s core concepts revolve around records (lines) and fields (columns). The built‑in variables NR, NF, RS, FS, ORS and OFS give fine‑grained control over how input is parsed and how output is formatted, making awk a versatile tool for quick data extraction, transformation and reporting on the command line.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
