Fundamentals 13 min read

Master AWK: A Quick Guide to Text Processing and Scripting

This comprehensive AWK tutorial, translated and refined from an original English guide, walks readers through the language’s history, variants, typical uses, workflow, program structure, syntax, command‑line options, built‑in variables, operators, regular expressions, arrays, control flow, functions, I/O redirection, and output formatting, providing practical examples and code snippets for rapid mastery.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master AWK: A Quick Guide to Text Processing and Scripting

Overview

AWK is an interpreted programming language designed for powerful text processing. Its name comes from the surnames of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. GNU AWK (gawk) is the FSF‑maintained implementation commonly shipped with Linux distributions.

AWK Types

Common variants are the original AT&T AWK, NAWK (an upgraded AT&T version), and GAWK (GNU AWK), which is fully compatible with both AWK and NAWK.

Typical Uses

Text processing

Formatted text reports

Arithmetic calculations

String manipulation

Workflow

AWK follows a simple read‑execute‑repeat cycle: it reads a line from input (file, pipe, or stdin), executes the specified commands (optionally filtered by patterns), and repeats until the end of the file.

Program Structure

AWK programs consist of optional blocks: BEGIN { awk‑commands } – executed once before any input is read, useful for initializing variables. /pattern/ { awk‑commands } – the BODY block runs for each line that matches the pattern; if no pattern is given it runs for every line. END { awk‑commands } – executed after all input has been processed.

Basic Syntax

AWK commands can be supplied directly on the command line inside single quotes or placed in a script file and invoked with awk -f script.awk. Standard options include -v var=value for variable assignment, --dump-variables[=file], --lint[=fatal], --posix, --profile[=file], --traditional, and --version.

Built‑in Variables

Important built‑in variables include ARGC (argument count), ENVIRON (environment array), NF (number of fields), OFS (output field separator), RSTART (match start position), and $n (the nth field of the current record). GNU‑specific variables such as ARGIND, BINMODE, ERRORNO, FIELDWIDTHS, IGNORECASE, and LINT are also described.

Operators

AWK supports arithmetic ( + - * / %), increment/decrement, assignment, relational, logical, ternary, unary, exponentiation, string concatenation, array member, and regular‑expression operators ( ~ and !~).

Regular Expressions

Powerful pattern matching is achieved with regular expressions; examples illustrate matching, substitution, and extraction.

Arrays

AWK provides associative (hash) arrays with string keys; only one‑dimensional arrays are native, but multi‑dimensional structures can be simulated.

Control Flow

Standard control structures ( if, while, for, do…while, break, continue, exit) work as in C‑like languages.

Functions

Built‑in functions cover mathematics ( atan2, cos, exp, int, log, rand, sin, sqrt, srand), strings ( asort, asorti, gsub, index, length, match, split, sprintf, strtonum, sub, substr, tolower, toupper), time ( systime, mktime, strftime), and bitwise operations ( and, or, xor, compl, lshift, rshift). User‑defined functions follow the syntax function name(arg1, arg2) { … }.

I/O Redirection

Output can be redirected to files using > or >> after print or printf. Pipes ( |) allow sending data to other programs, and the special |& operator creates a bidirectional pipe for interactive communication.

Formatted Output

The printf function, borrowed from C, provides format specifiers such as %c, %d, %s, etc., for precise control over output layout.

Executing Shell Commands

Shell commands can be run via the system() function (returns exit status) or by opening a pipe to /bin/sh and reading/writing through it.

References

AWK Tutorial

The GNU Awk User’s Guide

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

text processingShell scriptingawkgawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.