Fundamentals 31 min read

Mastering Awk: From Basics to Advanced Text Processing

This comprehensive guide introduces Awk, explains its command‑line syntax, script structure, patterns, built‑in variables, arrays, functions, operators, statements, I/O handling, and practical examples, enabling readers to harness Awk for powerful text processing tasks on Unix-like systems.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Awk: From Basics to Advanced Text Processing

Table of Contents

What is Awk Command‑line Syntax Script Composition Pattern Regular Expression Expressions Arrays Built‑in Variables Delete ARGV element Add ARGV element ARGV and ARGC CONVFMT and OFMT ENVIRON RLENGTH and RSTART Operators Statement Math Functions String Functions sub gsub index length match split sprintf substr tolower toupper I/O Functions getline close system

Awk, together with sed and grep, is often called the "three swords" of Linux. While all three can match text, sed and awk can also edit text, whereas grep cannot. Sed is a non‑interactive, stream‑oriented editor; awk is a pattern‑matching programming language that supports variables, functions, loops, and conditional statements, making it more powerful than simple command‑line tools.

Using Awk you can:

Treat a text file as a database of records and fields.

Use variables while processing the database.

Perform arithmetic and string operations.

Employ common programming structures such as conditionals and loops.

Format output.

Define custom functions.

Execute UNIX commands within an Awk script.

Process the output of UNIX commands.

We start with the most basic command‑line syntax and gradually explore Awk’s programming capabilities.

Command‑line Syntax

Awk has two forms of command‑line syntax, similar to sed:

The program part is analogous to a sed script; it consists of a sequence of pattern { action } pairs. When a record matches a pattern, the associated action is executed. In the first form, the program must appear as the first non‑option argument.

Awk parses input into records (default separator is a newline) and fields (default separator is whitespace). The record separator can be changed with the built‑in variable RS, and the field separator with FS or the -F option. Fields are accessed as $1, $2, …, while $0 holds the entire record.

Standard command‑line options include: -F ERE: set the field separator to an extended regular expression. -f progfile: read an Awk script from a file (multiple -f files are concatenated in order). -v assignment: assign a variable before processing begins (e.g., -v var=value).

Examples:

Access a variable set with -v inside the script:

The BEGIN pattern runs before any input is processed; END runs after all input has been processed.

Records and Fields

In a database, a table consists of records (rows) and fields (columns). Awk treats a text file similarly: each line is a record, split into fields by the field separator. You can change the separator, for example to colon for /etc/passwd:

Access fields with $1, $2, …, $NF (last field) and $(NF-1) (second‑last). The built‑in variable NF holds the number of fields in the current record.

Script Composition

An Awk script is a series of pattern { action } blocks. If the pattern is omitted, the action runs for every input line. A simple example that prints each line:

Functions can be defined as:

Function parameters are local; variables defined outside functions are global:

Statements can be separated by newlines or semicolons; a backslash ( \) can continue a long statement onto the next line:

Pattern

Patterns determine when an action is executed. Types include: /regular expression/: extended regular expression.

Relational expression (e.g., $1 > 5). BEGIN: runs before the first record. END: runs after the last record. pattern, pattern: address range, similar to sed.

Example: print lines containing the digit 3:

Negate a pattern with !:

Logical AND ( &&) and OR ( ||) can combine patterns:

Match a field with an expression: $n ~ /ere/:

Print only the first line:

Regular Expression

For a thorough review of regular‑expression syntax, see the POSIX specification or related articles.

Expressions

Expressions combine constants, variables, operators, and functions. Variables may be user‑defined, built‑in (uppercase), or field variables ( $n). Uninitialized string variables default to an empty string; numeric variables default to 0.

Arrays

Arrays are associative; indices can be numbers or strings. Assignment: array[index]=value Iterate with for (item in array) or test membership with if (item in array):

for (item in array)
if (item in array)

Complete example:

Built‑in Variables

Awk provides many built‑in variables. Important ones include:

ARGC : number of command‑line arguments (size of ARGV).

ARGV : array of command‑line arguments (excluding options).

CONVFMT : format for converting numbers to strings (default "%.6g").

OFMT : format for numbers when printed (default "%.6g").

ENVIRON : associative array of environment variables.

FILENAME : name of the current input file.

NR : total number of records read so far.

FNR : number of records read from the current file.

FS : field separator (default whitespace).

NF : number of fields in the current record.

RS : record separator (default newline).

OFS : output field separator (default whitespace).

ORS : output record separator (default newline).

RLENGTH : length of the substring matched by match().

RSTART : start position of the substring matched by match().

ARGV and ARGC

Similar to C’s int main(int argc, char **argv). ARGV holds file names and variable assignments; ARGC is its length. Example usage:

You can modify ARGV to add, delete, or replace elements. Deleting an element skips the corresponding file:

Adding an element:

CONVFMT and OFMT

CONVFMT

controls how numbers are converted to strings internally; default "%.6g". Changing it:

OFMT

affects number‑to‑string conversion during output:

ENVIRON

ENVIRON

is an associative array of environment variables. Example:

You can pass values to Awk via environment variables:

Iterate over ENVIRON:

RLENGTH and RSTART

Both are set by match(). RLENGTH is the length of the matched substring; RSTART is its start position (1‑based). Example:

Operators

Awk supports arithmetic, relational, logical, string concatenation, and ternary operators. See the Expressions in awk section of the man page for a complete list.

Statement

Common statements include print, printf, delete, break, continue, exit, and next. Example of printf:

break

exits a loop; continue skips to the next iteration. delete removes an array element. exit terminates processing after executing the END block. next skips the rest of the current record and reads the next one.

Output redirection examples:

Write specific columns to separate files:

Pipe output to a command (e.g., sort -n):

Math Functions

Awk provides standard math functions: atan2(y,x), cos(x), sin(x), exp(x), log(x), sqrt(x), int(x), rand() (returns a random number in [0,1)), and srand([expr]) to set the seed.

Example of generating a random number:

Set a different seed with srand() to obtain different sequences across runs:

Generate an integer between 1 and n :

String Functions

Awk includes many string manipulation functions.

sub

sub(ere, repl[, in])

replaces the first occurrence of ere with repl in in (default $0) and returns the number of replacements.

Example:

In the replacement string, & represents the matched text.

Example using &:

gsub

gsub(ere, repl[, in])

performs a global substitution (all matches).

index

index(s, t)

returns the position (1‑based) of substring t in s, or 0 if not found.

Example:

length

length([s])

returns the length of string s; if omitted, $0 is used.

Example:

match

match(s, ere)

searches s for the regular expression ere. It returns the start position or 0 if no match, and sets RSTART and RLENGTH.

Example:

split

split(s, a[, fs])

splits string s into array a using field separator fs (default FS). Returns the number of fields.

Example:

Iterating with for (i=1; i<=n; i++) preserves order.

sprintf

sprintf(fmt, expr, ...)

works like printf but returns the formatted string instead of printing.

Example:

substr

substr(s, m[, n])

returns the substring of s starting at position m (1‑based) with length n. If n is omitted, the rest of the string is returned.

Example:

tolower / toupper

tolower(s)

converts s to lower case; toupper(s) converts to upper case.

Examples:

I/O Functions

getline

expression | getline [var]

reads a line from the output of expression. If var is supplied, the line is stored there; otherwise $0 and NF are updated.

Example reading from a file:

Without a variable, the line becomes the current record:

close

close("command")

closes a pipe opened by getline or by redirection. Use with care to avoid infinite loops.

Example:

system

system("command")

executes an external command.

Example:

Conclusion

This article provides a concise yet comprehensive overview of Awk, covering its syntax, script structure, patterns, built‑in variables, arrays, functions, operators, statements, and I/O handling. Readers are encouraged to experiment with the examples and explore Awk’s powerful text‑processing capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Unixregular expressionstext processingShell scriptingawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.