Fundamentals 18 min read

Master AWK: A Quick Guide to Text Processing and Scripting

This comprehensive AWK tutorial introduces the language’s origins, variants, typical use‑cases, workflow, program structure, syntax, command‑line options, operators, regular expressions, arrays, control flow, functions, I/O redirection, and shell integration, providing clear examples and visual diagrams for beginners.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master AWK: A Quick Guide to Text Processing and Scripting

Overview

AWK is an interpreted programming language powerful for text processing, named after its three authors: Alfred Aho, Peter Weinberger, and Brian Kernighan. GNU/Linux distributions ship GNU AWK (gawk), maintained by the Free Software Foundation.

AWK Variants

AWK – the original version from AT&T labs.

NAWK – an upgraded version from AT&T.

GAWK – GNU AWK, fully compatible with AWK and NAWK and included in all GNU/Linux releases.

Typical Uses

Text processing

Generating formatted text reports

Performing arithmetic calculations

String manipulation

Workflow

AWK follows a simple workflow: Read, Execute, Repeat. The diagram below illustrates this process.

Read

AWK reads a line from the input stream (file, pipe, or standard input) and stores it in memory.

Execute

All AWK commands are applied to the input line. By default, commands run on every line, but patterns can restrict this behavior.

Repeat

The process repeats until the end of the file is reached.

Program Structure

AWK programs consist of three optional blocks:

BEGIN – executed once before processing starts; used for initialization.

BODY – pattern/action statements applied to each input line.

END – executed once after all input has been processed.

Basic Syntax

AWK commands can be run directly on the command line or placed in a script file.

Command‑Line Example

Script File Example

Standard Options

-v – assign a variable before program execution.

--dump-variables[=file] – output sorted global variables to a file (default awkvars.out).

--help – display help information.

--lint[=fatal] – check for incompatibilities; with fatal treat warnings as errors.

--posix – enforce strict POSIX compatibility.

--profile[=file] – write a formatted version of the program to a file (default awkprof.out).

--traditional – disable all gawk extensions.

--version – print version number.

Basic Usage Examples

Using a sample file marks.txt containing student names, subjects, and scores, the following examples demonstrate common tasks.

Print All Lines

Print Specific Columns

Print in Arbitrary Order

Print Lines Longer Than 18 Characters

Built‑in Variables

ARGC – number of command‑line arguments.

ENVIRON – associative array of environment variables.

NF – number of fields in the current record.

OFS – output field separator (default is a space).

RSTART – position of the first match in match().

$n – nth field of the current line.

GNU AWK Specific Variables

ARGIND – index of the current ARGV element being processed.

BINMODE – forces binary mode for file I/O on non‑POSIX systems.

ERRORNO – error string for failed getline or close calls.

FIELDWIDTHS – defines fixed‑width fields when set.

IGNORECASE – makes pattern matching case‑insensitive.

LINT – dynamic control of the --lint option.

Operators

Arithmetic

Supported operators: +, -, *, /, %.

Increment/Decrement

Same as C: ++, --.

Assignment

Standard assignment operators, including compound forms ( +=, -=, etc.).

Relational

Operators: ==, !=, <, <=, >, >=.

Logical

Operators: &&, ||, !.

Conditional

ternary operator condition ? expr1 : expr2.

String Concatenation

Simply place strings or variables next to each other.

Array Member

Accessed via array[index].

Regular Expression

Use ~ to match and !~ to negate.

More details on regex are covered later.

Regular Expressions

AWK provides powerful regex capabilities for complex text processing.

Arrays

AWK supports associative arrays with string keys; numeric indices need not be consecutive. Only one‑dimensional arrays are native, but multidimensional structures can be simulated.

Control Flow

Standard control statements (if, else, switch) follow C‑like syntax.

Loops

Supported loops: for, while, do...while, with break, continue, and exit statements.

exit terminates the script; the exit status can be retrieved via $? in the shell.

Functions

Built‑in Functions

Commonly used built‑in functions include: atan2(y,x), cos(expr), exp(expr), int(expr), log(expr), rand, sin(expr), sqrt(expr), srand([expr]) String functions such as asort(), asorti(), gsub(), index(), length(), match(), split(), sprintf(), strtonum(), sub(), substr(), tolower(), toupper() Time functions: systime, mktime(datespec), strftime([format[,timestamp[,utc‑flag]]]) Byte‑wise operators: and, compl, lshift, rshift, or,

xor

User‑Defined Functions

Functions can be defined to encapsulate reusable code, improving modularity and testability.

Output Redirection

Redirection Operator

Both print and printf can redirect output to a file using the > operator, mirroring shell syntax.

Pipes

Output can be piped to other programs using the | operator. For bidirectional communication, gawk provides |& which creates a two‑way pipe.

Note: Standard error from co‑processes is merged with gawk’s error output, and buffering issues may cause deadlocks when using getline .

Closing a pipe with close() (specifying “to” or “from”) is essential when interacting with commands like sort.

Beautifying Output

The printf function, borrowed from C, offers advanced formatting capabilities.

Format specifiers include %c , %d , %s , etc., similar to C.

Executing Shell Commands

system() Function

Runs an operating‑system command and returns its exit status.

Using Pipes

Commands can be sent to /bin/sh via a pipe for execution.

References

AWK Tutorial

The GNU Awk User’s Guide

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scriptingtext processingawkgawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.