Master AWK: Powerful Text Processing Techniques and Real-World Examples
This article introduces AWK—a versatile text‑analysis language—explaining its origins, core concepts, command‑line usage, built‑in variables, printing functions, programming constructs, conditionals, loops, and associative arrays, and provides practical Linux examples for extracting and summarizing data.
Introduction
AWK is a powerful text‑analysis tool that reads files line‑by‑line, splits each line into fields (default separator is a space), and enables complex data processing and reporting.
There are three versions—awk, nawk, and gawk—with gawk being the GNU implementation. The name comes from its creators Alfred Aho, Peter Weinberger, and Brian Kernighan.
Basic Usage
The general syntax is awk '{pattern + action}' filenames, where pattern is a regular expression that selects records and action is a series of commands executed on matching records.
AWK processes records separated by newline characters and fields separated by the field separator ( FS), which defaults to whitespace.
Invoking AWK
There are three common ways to run AWK:
Command‑line: awk -F ':' '{print $1}' /etc/passwd Shell script: place AWK commands in a script file and start it with #!/bin/awk (or #!/bin/sh with awk calls inside).
Separate script file: awk -f script.awk inputfile.
Practical Examples
Extract the last five login entries:
# last -n 5 | awk '{print $1}'
List usernames from /etc/passwd:
# cat /etc/passwd | awk -F ':' '{print $1}'
Show usernames with their shells, separated by a tab:
# cat /etc/passwd | awk -F ':' '{print $1 "\t" $7}'
Generate a CSV with a header and a custom footer line:
# cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
Built‑in Variables
ARGC– number of command‑line arguments ARGV – array of command‑line arguments ENVIRON – environment variables FILENAME – current file name FNR – record number in the current file FS – input field separator (can be set with -F) NF – number of fields in the current record NR – total record number across all input files OFS – output field separator ORS – output record separator RS – input record separator
Special fields: $0 is the entire record, $1, $2, … are individual fields.
Printing Functions
AWK provides both print and printf. print outputs values separated by OFS, while printf offers C‑style formatting for more complex output.
Programming Constructs
Variables can be built‑in or user‑defined. Example of counting lines in /etc/passwd:
awk '{count++;} END {print "user count is", count}' /etc/passwd
A more robust version initializes the counter in a BEGIN block:
awk 'BEGIN {count=0} {count++;} END {print "[end] user count is", count}' /etc/passwd
Conditionals
AWK’s conditional syntax mirrors C:
if (expression) { statement; } else { statement; }Loops
Supported loop constructs include while, do/while, for, as well as break and continue.
Arrays
AWK arrays are associative; indices can be strings or numbers. Example to collect and print all usernames from /etc/passwd:
awk -F ':' 'BEGIN {count=0} {name[count]=$1; count++;} END {for (i=0; i<count; i++) print name[i]}' /etc/passwd
Additional Tips
Use printf for formatted output, e.g., to display file sizes in megabytes:
ls -l | awk 'BEGIN {size=0} {size+= $5} END {printf "[end] size is %.2fM\n", size/1024/1024}'
Filter out directory entries (size 4096) with a simple if statement inside the AWK program.
References
For a complete reference, see the GNU AWK manual at http://www.gnu.org/software/gawk/manual/gawk.html .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
