Master awk: From Basics to Advanced Text Processing in Linux
This guide introduces awk as a powerful Linux command‑line tool, explains its pattern‑action syntax, built‑in variables such as NR, NF, FS, RS, OFS, and ORS, and provides step‑by‑step examples for generating test data, extracting fields, modifying separators, and solving common interview tasks.
What is awk
awk is a powerful Linux command for text formatting, capable of turning raw text data into professionally formatted tables, similar to Excel. It originated on Unix; the version commonly used today is gawk (GNU awk).
How to learn awk
awk syntax format
The awk command consists of pattern, action, or a combination of both.
Pattern (pattern) works like sed's pattern matching and can be an expression or a regular expression between two slashes, e.g., NR==1 .
Action (action) is one or more statements inside curly braces, separated by semicolons. Example format is shown in the image below.
Pattern and action
Pattern specifies which lines to operate on.
Action defines what to do with those lines.
# Generate test data
echo cc{01..50} | xargs -n 5 > yuchao.logNo pattern, only action
When no pattern is given, awk processes every line by default. The action {print $0} prints the whole line; $0 represents the entire record, while $1, $2, etc., represent individual fields.
# Print all lines
awk '{print $0}' test_awk.log
# Print only the first column
awk '{print $1}' test_awk.log
# Print the second column
awk '{print $2}' test_awk.log
# Print first and third columns
awk '{print $1,$3}' test_awk.logRow variable NR and range syntax
NR is the built‑in variable that holds the current record (line) number.
# Print line number and line content
awk '{print NR,$0}' test_awk.log
# Print only the second line
awk 'NR==2{print $0}' test_awk.log
# Print second line's first and fourth columns
awk 'NR==2{print $1,$4}' test_awk.log
# Print lines 2 to 5
awk 'NR>=2 && NR<=5{print $0}' test_awk.log
# Print lines 2‑5, first three columns only
awk 'NR>=2 && NR<=5{print $1,$2,$3}' test_awk.logColumn variable NF and field handling
NF (Number of Fields) stores the number of columns in the current record.
# Show each line with its field count
awk '{print $0,NF}' test_awk.log
# Access specific fields
$0 # whole line
$1 # first column
$2 # second column
$NF # last column
$(NF-1) # second‑last columnSpecifying line (pattern) and printing action
awk 'pattern {print action}'
Example: extract the second line using NR==2{print $0}.
# Lazy form (awk prints $0 automatically)
awk 'NR==2' test_awk.logMultiple patterns and actions (explain NR, NF)
Pattern NR==4 selects the fourth record.
Action must be inside braces.
# Print line, field count, and line number
awk '{print $0,NF,NR}' test_awk.log$0 – entire line.
NF – number of fields.
NR – record number.
Quick awk recap
Both pattern and action must be quoted with single quotes to avoid shell interpretation.
If no pattern is given, awk processes the whole input file line by line.
Actions must be enclosed in curly braces; otherwise the braces are interpreted as part of the pattern.
Input is usually a file, but data can also be piped.
Interview question – word frequency
Goal: count word occurrences in a text and list the top five.
# Using sed
sed -r 's#[^a-zA-Z]+#
#g' english.log | sort | uniq -c | sort -r -n | head -5
# Using tr
cat english.log | tr ' ' '
' | sort | uniq -c | sort -r -n | head -5
# Using grep
grep -E '[a-zA-Z]+' english.log -o | sort | uniq -c | sort -r -n | head -5
# Using awk (simple)
awk -v RS=' ' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5
# Using awk (complex regex)
awk -v RS='[^a-zA-Z]+' '{print $0}' english.log | sort | uniq -c | sort -r -n | head -5awk built‑in variables overview
NR – record (line) number.
NF – number of fields (columns) in the current line.
FS – input field separator (default space/tab).
RS – input record separator (default newline).
OFS – output field separator (default space).
ORS – output record separator (default newline).
FILENAME – name of the current file.
Modifying RS and ORS
Changing RS alters how awk splits input records; changing ORS changes the line terminator in the output.
# Use space as record separator
awk -v RS=' ' '{print $0}' test_awk.log
# Change output separator to '@@'
awk -v ORS='@@' '{print $0}' test_awk.logModifying FS and OFS
FS defines how input fields are split; OFS defines how fields are joined in the output.
# Split /etc/passwd by ':' and print username and shell
awk -v FS=':' '{print $1,$NF}' /etc/passwd
# Change output separator to '#'
awk -v OFS='#' '{print $1,$2,$3,$4,$5}' test_awk.logSummary of rows and columns
RS/ORS control line (record) separators.
FS/OFS control column (field) separators.
NR gives the current line number.
NF gives the number of columns in the current line.
$1, $2 … access specific columns; $NF accesses the last column.
By adjusting these variables, awk can handle a wide range of text‑processing tasks efficiently.
Link: https://www.cnblogs.com/btcm409181423/p/18024202
(© Original author, please delete if infringing)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
