Master AWK: A Quick Guide to Text Processing and Scripting
This comprehensive AWK tutorial, translated and refined from an original English guide, walks readers through the language’s history, variants, typical uses, workflow, program structure, syntax, command‑line options, built‑in variables, operators, regular expressions, arrays, control flow, functions, I/O redirection, and output formatting, providing practical examples and code snippets for rapid mastery.
Overview
AWK is an interpreted programming language designed for powerful text processing. Its name comes from the surnames of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. GNU AWK (gawk) is the FSF‑maintained implementation commonly shipped with Linux distributions.
AWK Types
Common variants are the original AT&T AWK, NAWK (an upgraded AT&T version), and GAWK (GNU AWK), which is fully compatible with both AWK and NAWK.
Typical Uses
Text processing
Formatted text reports
Arithmetic calculations
String manipulation
Workflow
AWK follows a simple read‑execute‑repeat cycle: it reads a line from input (file, pipe, or stdin), executes the specified commands (optionally filtered by patterns), and repeats until the end of the file.
Program Structure
AWK programs consist of optional blocks: BEGIN { awk‑commands } – executed once before any input is read, useful for initializing variables. /pattern/ { awk‑commands } – the BODY block runs for each line that matches the pattern; if no pattern is given it runs for every line. END { awk‑commands } – executed after all input has been processed.
Basic Syntax
AWK commands can be supplied directly on the command line inside single quotes or placed in a script file and invoked with awk -f script.awk. Standard options include -v var=value for variable assignment, --dump-variables[=file], --lint[=fatal], --posix, --profile[=file], --traditional, and --version.
Built‑in Variables
Important built‑in variables include ARGC (argument count), ENVIRON (environment array), NF (number of fields), OFS (output field separator), RSTART (match start position), and $n (the nth field of the current record). GNU‑specific variables such as ARGIND, BINMODE, ERRORNO, FIELDWIDTHS, IGNORECASE, and LINT are also described.
Operators
AWK supports arithmetic ( + - * / %), increment/decrement, assignment, relational, logical, ternary, unary, exponentiation, string concatenation, array member, and regular‑expression operators ( ~ and !~).
Regular Expressions
Powerful pattern matching is achieved with regular expressions; examples illustrate matching, substitution, and extraction.
Arrays
AWK provides associative (hash) arrays with string keys; only one‑dimensional arrays are native, but multi‑dimensional structures can be simulated.
Control Flow
Standard control structures ( if, while, for, do…while, break, continue, exit) work as in C‑like languages.
Functions
Built‑in functions cover mathematics ( atan2, cos, exp, int, log, rand, sin, sqrt, srand), strings ( asort, asorti, gsub, index, length, match, split, sprintf, strtonum, sub, substr, tolower, toupper), time ( systime, mktime, strftime), and bitwise operations ( and, or, xor, compl, lshift, rshift). User‑defined functions follow the syntax function name(arg1, arg2) { … }.
I/O Redirection
Output can be redirected to files using > or >> after print or printf. Pipes ( |) allow sending data to other programs, and the special |& operator creates a bidirectional pipe for interactive communication.
Formatted Output
The printf function, borrowed from C, provides format specifiers such as %c, %d, %s, etc., for precise control over output layout.
Executing Shell Commands
Shell commands can be run via the system() function (returns exit status) or by opening a pipe to /bin/sh and reading/writing through it.
References
AWK Tutorial
The GNU Awk User’s Guide
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
