Master awk: From Basics to Advanced Text Processing
This comprehensive guide explains what awk is, how to learn it, its pattern‑action syntax, built‑in variables like NR, NF, FS, RS, and OFS, and provides numerous practical examples for extracting, formatting, and manipulating text data on Linux systems.
What is awk
awk is a powerful Linux command for text formatting, similar to converting raw data into a professional Excel‑style table. The original awk was implemented on Unix; the version used here is gawk (GNU awk).
How to learn awk
Awk syntax
An awk command consists of pattern, action or a combination of both.
Pattern (pattern) works like sed's pattern matching and can be an expression or a regular expression between slashes, e.g., NR==1 matches the first line.
Action (action) is one or more statements inside curly braces, separated by semicolons.
Pattern and action
Pattern specifies which lines to operate on.
Action defines what to do with the selected lines.
# generate test data
echo cc{01..50} | xargs -n 5 > yuchao.logNo pattern, only action
awk '{print $0}' test_awk.logHere $0 prints the entire line, while $1, $2, etc., refer to individual columns.
Row variables NR and range syntax
NR is the built‑in variable representing the current record number (line number).
# print line number and content
awk '{print NR,$0}' test_awk.logExamples:
# print only the second line
awk 'NR==2{print $0}' test_awk.log # print columns 1 and 3 of the second line
awk 'NR==2{print $1,$3}' test_awk.logColumn variables NF and field count
NF holds the number of fields in the current line.
# show each line with its field count
awk '{print $0,NF}' test_awk.log $NFprints the last field, and $(NF-1) prints the second‑last field.
Specifying lines (pattern) and printing actions
# print lines 2 to 5
awk 'NR>=2 && NR<=5{print $0}' test_awk.log # print lines 2 to 5, only first three columns
awk 'NR>=2 && NR<=5{print $1,$2,$3}' test_awk.logMultiple patterns and actions (explain NR, NF)
# print line number, field count, and content for first four lines
awk 'NR<=4{print NR,NF,$0}' test_awk.log $0– entire line. NF – number of fields (columns). NR – record (line) number.
Awk quick‑start summary
Both pattern and action must be quoted with single quotes to avoid shell interpretation.
If no pattern is given, awk processes every line and every column.
Actions must be inside curly braces.
awk 'pattern {action}'
Built‑in variables (translation)
# Common built‑in variables
NR # record number (line number)
NF # number of fields in the current line
FS # input field separator (default space/tab)
RS # input record separator (default newline)
OFS # output field separator (default space)
ORS # output record separator (default newline)
FILENAME # current file name
OFMT # output format for numbers (default "%.6g")Modifying RS/ORS
Changing RS alters how awk splits input records; changing ORS changes the line terminator in output.
# use '@@' as output line separator
awk -v ORS='@@' '{print $0}' test_awk.logChanging field separators (FS/OFS)
FS defines how input fields are split; OFS defines the separator used when printing fields.
# extract username and shell from /etc/passwd using ':' as FS and '---' as OFS
awk -v FS=':' -v OFS='---' 'NR==1{print $1,$(NF-1),$NF}' /etc/passwdPractical interview question – word frequency
Goal: count word occurrences in a text and list the top five.
# using awk to split on spaces
awk -v RS=' ' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5Alternative solutions with sed, tr, and grep are also shown.
Summary of rows and columns
RS and ORS control line (record) separators for input and output.
FS and OFS control column (field) separators for input and output.
NR – line number; NF – number of columns in the current line. $1, $2, …, $NF extract specific columns.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
