Master AWK: A Quick Guide to Text Processing and Scripting
This comprehensive AWK tutorial introduces the language’s origins, variants, typical use‑cases, workflow, program structure, syntax, command‑line options, operators, regular expressions, arrays, control flow, functions, I/O redirection, and shell integration, providing clear examples and visual diagrams for beginners.
Overview
AWK is an interpreted programming language powerful for text processing, named after its three authors: Alfred Aho, Peter Weinberger, and Brian Kernighan. GNU/Linux distributions ship GNU AWK (gawk), maintained by the Free Software Foundation.
AWK Variants
AWK – the original version from AT&T labs.
NAWK – an upgraded version from AT&T.
GAWK – GNU AWK, fully compatible with AWK and NAWK and included in all GNU/Linux releases.
Typical Uses
Text processing
Generating formatted text reports
Performing arithmetic calculations
String manipulation
Workflow
AWK follows a simple workflow: Read, Execute, Repeat. The diagram below illustrates this process.
Read
AWK reads a line from the input stream (file, pipe, or standard input) and stores it in memory.
Execute
All AWK commands are applied to the input line. By default, commands run on every line, but patterns can restrict this behavior.
Repeat
The process repeats until the end of the file is reached.
Program Structure
AWK programs consist of three optional blocks:
BEGIN – executed once before processing starts; used for initialization.
BODY – pattern/action statements applied to each input line.
END – executed once after all input has been processed.
Basic Syntax
AWK commands can be run directly on the command line or placed in a script file.
Command‑Line Example
Script File Example
Standard Options
-v – assign a variable before program execution.
--dump-variables[=file] – output sorted global variables to a file (default awkvars.out).
--help – display help information.
--lint[=fatal] – check for incompatibilities; with fatal treat warnings as errors.
--posix – enforce strict POSIX compatibility.
--profile[=file] – write a formatted version of the program to a file (default awkprof.out).
--traditional – disable all gawk extensions.
--version – print version number.
Basic Usage Examples
Using a sample file marks.txt containing student names, subjects, and scores, the following examples demonstrate common tasks.
Print All Lines
Print Specific Columns
Print in Arbitrary Order
Print Lines Longer Than 18 Characters
Built‑in Variables
ARGC – number of command‑line arguments.
ENVIRON – associative array of environment variables.
NF – number of fields in the current record.
OFS – output field separator (default is a space).
RSTART – position of the first match in match().
$n – nth field of the current line.
GNU AWK Specific Variables
ARGIND – index of the current ARGV element being processed.
BINMODE – forces binary mode for file I/O on non‑POSIX systems.
ERRORNO – error string for failed getline or close calls.
FIELDWIDTHS – defines fixed‑width fields when set.
IGNORECASE – makes pattern matching case‑insensitive.
LINT – dynamic control of the --lint option.
Operators
Arithmetic
Supported operators: +, -, *, /, %.
Increment/Decrement
Same as C: ++, --.
Assignment
Standard assignment operators, including compound forms ( +=, -=, etc.).
Relational
Operators: ==, !=, <, <=, >, >=.
Logical
Operators: &&, ||, !.
Conditional
ternary operator condition ? expr1 : expr2.
String Concatenation
Simply place strings or variables next to each other.
Array Member
Accessed via array[index].
Regular Expression
Use ~ to match and !~ to negate.
More details on regex are covered later.
Regular Expressions
AWK provides powerful regex capabilities for complex text processing.
Arrays
AWK supports associative arrays with string keys; numeric indices need not be consecutive. Only one‑dimensional arrays are native, but multidimensional structures can be simulated.
Control Flow
Standard control statements (if, else, switch) follow C‑like syntax.
Loops
Supported loops: for, while, do...while, with break, continue, and exit statements.
exit terminates the script; the exit status can be retrieved via $? in the shell.
Functions
Built‑in Functions
Commonly used built‑in functions include: atan2(y,x), cos(expr), exp(expr), int(expr), log(expr), rand, sin(expr), sqrt(expr), srand([expr]) String functions such as asort(), asorti(), gsub(), index(), length(), match(), split(), sprintf(), strtonum(), sub(), substr(), tolower(), toupper() Time functions: systime, mktime(datespec), strftime([format[,timestamp[,utc‑flag]]]) Byte‑wise operators: and, compl, lshift, rshift, or,
xorUser‑Defined Functions
Functions can be defined to encapsulate reusable code, improving modularity and testability.
Output Redirection
Redirection Operator
Both print and printf can redirect output to a file using the > operator, mirroring shell syntax.
Pipes
Output can be piped to other programs using the | operator. For bidirectional communication, gawk provides |& which creates a two‑way pipe.
Note: Standard error from co‑processes is merged with gawk’s error output, and buffering issues may cause deadlocks when using getline .
Closing a pipe with close() (specifying “to” or “from”) is essential when interacting with commands like sort.
Beautifying Output
The printf function, borrowed from C, offers advanced formatting capabilities.
Format specifiers include %c , %d , %s , etc., similar to C.
Executing Shell Commands
system() Function
Runs an operating‑system command and returns its exit status.
Using Pipes
Commands can be sent to /bin/sh via a pipe for execution.
References
AWK Tutorial
The GNU Awk User’s Guide
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
