Operations 12 min read

Master awk: Essential Commands and Advanced Techniques for Text Processing

This article provides a comprehensive introduction to awk, covering its command syntax, built‑in variables, pattern‑action processing, external variable passing, arithmetic and logical operations, advanced I/O, control flow constructs, and practical examples for effective text and data manipulation on Unix/Linux systems.

Raymond Ops
Raymond Ops
Raymond Ops
Master awk: Essential Commands and Advanced Techniques for Text Processing

Introduction

awk is a programming language for processing text and data on Linux/Unix. It can read from stdin, files, or other commands, supports user‑defined functions and dynamic regular expressions, and works both interactively and as scripts.

awk command format

awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)

Common options: -F fs to set input field separator, -v var=value to assign external variable, -f scriptfile to read script from a file.

The script consists of pattern and action.

Pattern types

Regular expression

Relational expression

Pattern‑matching expression using ~ or !~ BEGIN, pattern, END blocks

Action

Actions are one or more commands, functions, or expressions separated by newlines or semicolons and enclosed in braces. Typical actions include variable/array assignments, print statements, built‑in functions, and control‑flow statements.

Basic awk script format

awk 'BEGIN{ commands } pattern{ commands } END{ commands }' file

BEGIN, pattern, and END blocks are optional; the script is usually quoted with single or double quotes.

Execution process

Execute the BEGIN block (initialization, header printing).

Read each line, then execute the pattern block (or default {print} if omitted).

After the last line, execute the END block (summary, final output).

Built‑in variables

$n      # nth field of the current record
$0      # entire current line
ARGC    # number of command‑line arguments
ARGIND  # index of the current file
ARGV    # array of command‑line arguments
CONVFMT # numeric conversion format
ENVIRON # environment variables array
ERRNO   # description of the last system error
FIELDWIDTHS # list of field widths
FILENAME # name of the current input file
NR      # total record number (line number)
FNR     # record number in the current file
FS      # input field separator (default whitespace)
IGNORECASE # case‑insensitive matching when true
NF      # number of fields in the current record
OFMT    # output format for numbers
OFS     # output field separator
ORS     # output record separator
RS      # input record separator
RSTART  # start position of the last match
RLENGTH # length of the last match
SUBSEP  # subscript separator for multidimensional arrays

Passing external variables

Use the -v option or assign variables after the script:

VAR=10000
echo | awk -v VARIABLE=$VAR '{ print VARIABLE }'
var1="aaa"
var2="bbb"
echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2

Operators

Arithmetic: +, -, *, /, %, ++, -- Assignment: =, +=, -=, *=, /=, %=, **= Logical: ||, && Relational: <, <=, >, >=, !=, == Regex:

~, !~

Advanced I/O

next

skips the current record and proceeds to the next line.

awk 'NR%2==1{next}{print NR,$0}' text.txt
getline

reads a line from a file, pipe, or command and assigns it to a variable, updating NF, NR, FNR. It returns 1 on success, 0 on EOF, and –1 on error.

File operations

open("filename")

– open a file. close("filename") – close a file.

Redirect output with > or >> inside printf or print.

Loop structures

for (var in array) { statements }
for (init; condition; increment) { statements }
while (condition) { statements }
do { statements } while (condition)

Other statements

break

– exit a loop. continue – skip to the next iteration. next – read the next input line. exit – exit the main input loop and run END block.

Examples

Print each line of /etc/passwd: awk '{print}' /etc/passwd Print the first field using colon as separator: awk -F ':' '{print $1}' /etc/passwd Count blank lines:

awk 'BEGIN{X=0} /^$/{ X+=1 } END{print "I find",X,"blank lines."}' file

Calculate total file size:

ls -l | awk 'BEGIN{sum=0} !/^d/{sum+=$5} END{print "total size is",sum}'
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

command-lineUnixtext processingShell scriptingawk
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.