Fundamentals 20 min read

Master awk: From Basics to Advanced Text Processing in Linux

This guide introduces awk as a powerful Linux command‑line tool, explains its pattern‑action syntax, built‑in variables such as NR, NF, FS, RS, OFS, and ORS, and provides step‑by‑step examples for generating test data, extracting fields, modifying separators, and solving common interview tasks.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master awk: From Basics to Advanced Text Processing in Linux

What is awk

awk is a powerful Linux command for text formatting, capable of turning raw text data into professionally formatted tables, similar to Excel. It originated on Unix; the version commonly used today is gawk (GNU awk).

How to learn awk

awk syntax format

The awk command consists of pattern, action, or a combination of both.

Pattern (pattern) works like sed's pattern matching and can be an expression or a regular expression between two slashes, e.g., NR==1 .

Action (action) is one or more statements inside curly braces, separated by semicolons. Example format is shown in the image below.

Pattern and action

Pattern specifies which lines to operate on.

Action defines what to do with those lines.

# Generate test data
echo cc{01..50} | xargs -n 5 > yuchao.log

No pattern, only action

When no pattern is given, awk processes every line by default. The action {print $0} prints the whole line; $0 represents the entire record, while $1, $2, etc., represent individual fields.

# Print all lines
awk '{print $0}' test_awk.log
# Print only the first column
awk '{print $1}' test_awk.log
# Print the second column
awk '{print $2}' test_awk.log
# Print first and third columns
awk '{print $1,$3}' test_awk.log

Row variable NR and range syntax

NR is the built‑in variable that holds the current record (line) number.

# Print line number and line content
awk '{print NR,$0}' test_awk.log
# Print only the second line
awk 'NR==2{print $0}' test_awk.log
# Print second line's first and fourth columns
awk 'NR==2{print $1,$4}' test_awk.log
# Print lines 2 to 5
awk 'NR>=2 && NR<=5{print $0}' test_awk.log
# Print lines 2‑5, first three columns only
awk 'NR>=2 && NR<=5{print $1,$2,$3}' test_awk.log

Column variable NF and field handling

NF (Number of Fields) stores the number of columns in the current record.

# Show each line with its field count
awk '{print $0,NF}' test_awk.log
# Access specific fields
$0  # whole line
$1  # first column
$2  # second column
$NF # last column
$(NF-1) # second‑last column

Specifying line (pattern) and printing action

awk 'pattern {print action}'

Example: extract the second line using NR==2{print $0}.

# Lazy form (awk prints $0 automatically)
awk 'NR==2' test_awk.log

Multiple patterns and actions (explain NR, NF)

Pattern NR==4 selects the fourth record.

Action must be inside braces.

# Print line, field count, and line number
awk '{print $0,NF,NR}' test_awk.log

$0 – entire line.

NF – number of fields.

NR – record number.

Quick awk recap

Both pattern and action must be quoted with single quotes to avoid shell interpretation.

If no pattern is given, awk processes the whole input file line by line.

Actions must be enclosed in curly braces; otherwise the braces are interpreted as part of the pattern.

Input is usually a file, but data can also be piped.

Interview question – word frequency

Goal: count word occurrences in a text and list the top five.

# Using sed
sed -r 's#[^a-zA-Z]+#
#g' english.log | sort | uniq -c | sort -r -n | head -5
# Using tr
cat english.log | tr ' ' '
' | sort | uniq -c | sort -r -n | head -5
# Using grep
grep -E '[a-zA-Z]+' english.log -o | sort | uniq -c | sort -r -n | head -5
# Using awk (simple)
awk -v RS=' ' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5
# Using awk (complex regex)
awk -v RS='[^a-zA-Z]+' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5

awk built‑in variables overview

NR – record (line) number.

NF – number of fields (columns) in the current line.

FS – input field separator (default space/tab).

RS – input record separator (default newline).

OFS – output field separator (default space).

ORS – output record separator (default newline).

FILENAME – name of the current file.

Modifying RS and ORS

Changing RS alters how awk splits input records; changing ORS changes the line terminator in the output.

# Use space as record separator
awk -v RS=' ' '{print $0}' test_awk.log
# Change output separator to '@@'
awk -v ORS='@@' '{print $0}' test_awk.log

Modifying FS and OFS

FS defines how input fields are split; OFS defines how fields are joined in the output.

# Split /etc/passwd by ':' and print username and shell
awk -v FS=':' '{print $1,$NF}' /etc/passwd
# Change output separator to '#'
awk -v OFS='#' '{print $1,$2,$3,$4,$5}' test_awk.log

Summary of rows and columns

RS/ORS control line (record) separators.

FS/OFS control column (field) separators.

NR gives the current line number.

NF gives the number of columns in the current line.

$1, $2 … access specific columns; $NF accesses the last column.

By adjusting these variables, awk can handle a wide range of text‑processing tasks efficiently.

Link: https://www.cnblogs.com/btcm409181423/p/18024202

(© Original author, please delete if infringing)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxregextext processingShell scriptingawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.