Operations 12 min read

Master awk: Essential Commands and Advanced Techniques for Text Processing

This article provides a comprehensive introduction to awk, covering its command syntax, built‑in variables, pattern‑action processing, external variable passing, arithmetic and logical operations, advanced I/O, control flow constructs, and practical examples for effective text and data manipulation on Unix/Linux systems.

Raymond Ops
Raymond Ops
Raymond Ops
Master awk: Essential Commands and Advanced Techniques for Text Processing

Introduction

awk is a programming language for processing text and data on Linux/Unix. It can read from stdin, files, or other commands, supports user‑defined functions and dynamic regular expressions, and works both interactively and as scripts.

awk command format

<code>awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)</code>

Common options:

-F fs

to set input field separator,

-v var=value

to assign external variable,

-f scriptfile

to read script from a file.

The script consists of pattern and action.

Pattern types

Regular expression

Relational expression

Pattern‑matching expression using

~

or

!~

BEGIN, pattern, END blocks

Action

Actions are one or more commands, functions, or expressions separated by newlines or semicolons and enclosed in braces. Typical actions include variable/array assignments, print statements, built‑in functions, and control‑flow statements.

Basic awk script format

<code>awk 'BEGIN{ commands } pattern{ commands } END{ commands }' file</code>

BEGIN, pattern, and END blocks are optional; the script is usually quoted with single or double quotes.

Execution process

Execute the BEGIN block (initialization, header printing).

Read each line, then execute the pattern block (or default

{print}

if omitted).

After the last line, execute the END block (summary, final output).

Built‑in variables

<code>$n      # nth field of the current record
$0      # entire current line
ARGC    # number of command‑line arguments
ARGIND  # index of the current file
ARGV    # array of command‑line arguments
CONVFMT # numeric conversion format
ENVIRON # environment variables array
ERRNO   # description of the last system error
FIELDWIDTHS # list of field widths
FILENAME # name of the current input file
NR      # total record number (line number)
FNR     # record number in the current file
FS      # input field separator (default whitespace)
IGNORECASE # case‑insensitive matching when true
NF      # number of fields in the current record
OFMT    # output format for numbers
OFS     # output field separator
ORS     # output record separator
RS      # input record separator
RSTART  # start position of the last match
RLENGTH # length of the last match
SUBSEP  # subscript separator for multidimensional arrays</code>

Passing external variables

Use the

-v

option or assign variables after the script:

<code>VAR=10000
echo | awk -v VARIABLE=$VAR '{ print VARIABLE }'</code>
<code>var1="aaa"
var2="bbb"
echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2</code>

Operators

Arithmetic:

+, -, *, /, %, ++, --

Assignment:

=, +=, -=, *=, /=, %=, **=

Logical:

||, &&

Relational:

<, <=, >, >=, !=, ==

Regex:

~, !~

Advanced I/O

next

skips the current record and proceeds to the next line.

<code>awk 'NR%2==1{next}{print NR,$0}' text.txt</code>
getline

reads a line from a file, pipe, or command and assigns it to a variable, updating

NF, NR, FNR

. It returns 1 on success, 0 on EOF, and –1 on error.

File operations

open("filename")

– open a file.

close("filename")

– close a file.

Redirect output with

>

or

>>

inside

printf

or

print

.

Loop structures

<code>for (var in array) { statements }
for (init; condition; increment) { statements }
while (condition) { statements }
do { statements } while (condition)</code>

Other statements

break

– exit a loop.

continue

– skip to the next iteration.

next

– read the next input line.

exit

– exit the main input loop and run END block.

Examples

Print each line of

/etc/passwd

:

<code>awk '{print}' /etc/passwd</code>

Print the first field using colon as separator:

<code>awk -F ':' '{print $1}' /etc/passwd</code>

Count blank lines:

<code>awk 'BEGIN{X=0} /^$/{ X+=1 } END{print "I find",X,"blank lines."}' file</code>

Calculate total file size:

<code>ls -l | awk 'BEGIN{sum=0} !/^d/{sum+=$5} END{print "total size is",sum}'</code>
command lineUnixtext processingshell scriptingawk
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.