Fundamentals 18 min read

Master awk: From Basics to Advanced Text Processing on Linux

This guide explains what awk is, its pattern‑action syntax, built‑in variables such as NR, NF, FS, RS, OFS and ORS, and provides step‑by‑step examples—including generating test data, column manipulation, custom delimiters, and solving a word‑frequency interview question—so readers can efficiently extract and transform text on the command line.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Master awk: From Basics to Advanced Text Processing on Linux

What is awk

awk is a powerful Linux command‑line utility for pattern‑action text processing, originally implemented on Unix. The GNU version (gawk) is the most commonly used implementation.

How to learn awk

Start by experimenting with simple one‑liner commands, then explore patterns, actions, and built‑in variables to handle more complex data extraction tasks.

Syntax format

awk programs consist of pattern, action or just an action. A pattern (similar to sed’s regular‑expression match) selects which lines to process, while an action is a list of statements inside braces.

Pattern and action examples

Pattern selects which lines to process, e.g. NR==1 selects the first line.

Action performs operations, e.g. {print $0} prints the whole line.

Generating test data

echo cc{01..50} | xargs -n 5 > yuchao.log

No pattern, only action

When no pattern is given, awk processes every input line. Common actions include printing fields:

awk '{print $0}' test_awk.log          # print whole line
awk '{print $1}' test_awk.log          # first column
awk '{print $2}' test_awk.log          # second column
awk '{print $1,$3}' test_awk.log       # first and third columns

Row variables (NR) and range syntax

NR is the record number (line number). It can be used to select specific lines:

awk 'NR==2{print $0}' test_awk.log          # second line
awk 'NR>=2 && NR<=5{print $0}' test_awk.log   # lines 2‑5

Column variables (NF) and field references

NF holds the number of fields in the current record. $1, $2$NF refer to individual columns; $0 is the whole line.

awk '{print $0,NF}' test_awk.log   # show line and field count
awk '{print $NF}' test_awk.log    # last column

Other built‑in variables

FS – input field separator (default space/tab)

RS – input record separator (default newline)

OFS – output field separator (default space)

ORS – output record separator (default newline)

FILENAME – current file name

Modifying separators

Changing RS or FS lets awk treat different characters as line or column delimiters. Example: treat space as a record separator to make each word a separate line.

awk -v RS=' ' '{print $0}' english.log

Interview question – word frequency

Count the most frequent words in a text and show the top five.

# Using sed, sort, uniq
sed -r 's#[^a-zA-Z]+#
#g' english.log | sort | uniq -c | sort -r -n | head -5

# Using tr
cat english.log | tr ' ' '
' | sort | uniq -c | sort -r -n | head -5

# Using awk
awk -v RS='[^a-zA-Z]+' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5

Column operations and custom delimiters

FS can be set to a character or regular expression, e.g. to split /etc/passwd on ‘:’ and print username and the last field: awk -v FS=':' '{print $1,$NF}' /etc/passwd OFS can be changed to format output, for example replace spaces with ‘#’:

awk -v OFS='#' '{print $1,$2,$3,$4,$5}' test_awk.log

Summary

Awk’s core concepts revolve around records (lines) and fields (columns). The built‑in variables NR, NF, RS, FS, ORS and OFS give fine‑grained control over how input is parsed and how output is formatted, making awk a versatile tool for quick data extraction, transformation and reporting on the command line.

Linuxtext processingshell scriptingawkFSNFNRRS
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.