Master awk: Essential Commands and Advanced Techniques for Text Processing
This article provides a comprehensive introduction to awk, covering its command syntax, built‑in variables, pattern‑action processing, external variable passing, arithmetic and logical operations, advanced I/O, control flow constructs, and practical examples for effective text and data manipulation on Unix/Linux systems.
Introduction
awk is a programming language for processing text and data on Linux/Unix. It can read from stdin, files, or other commands, supports user‑defined functions and dynamic regular expressions, and works both interactively and as scripts.
awk command format
awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)Common options: -F fs to set input field separator, -v var=value to assign external variable, -f scriptfile to read script from a file.
The script consists of pattern and action.
Pattern types
Regular expression
Relational expression
Pattern‑matching expression using ~ or !~ BEGIN, pattern, END blocks
Action
Actions are one or more commands, functions, or expressions separated by newlines or semicolons and enclosed in braces. Typical actions include variable/array assignments, print statements, built‑in functions, and control‑flow statements.
Basic awk script format
awk 'BEGIN{ commands } pattern{ commands } END{ commands }' fileBEGIN, pattern, and END blocks are optional; the script is usually quoted with single or double quotes.
Execution process
Execute the BEGIN block (initialization, header printing).
Read each line, then execute the pattern block (or default {print} if omitted).
After the last line, execute the END block (summary, final output).
Built‑in variables
$n # nth field of the current record
$0 # entire current line
ARGC # number of command‑line arguments
ARGIND # index of the current file
ARGV # array of command‑line arguments
CONVFMT # numeric conversion format
ENVIRON # environment variables array
ERRNO # description of the last system error
FIELDWIDTHS # list of field widths
FILENAME # name of the current input file
NR # total record number (line number)
FNR # record number in the current file
FS # input field separator (default whitespace)
IGNORECASE # case‑insensitive matching when true
NF # number of fields in the current record
OFMT # output format for numbers
OFS # output field separator
ORS # output record separator
RS # input record separator
RSTART # start position of the last match
RLENGTH # length of the last match
SUBSEP # subscript separator for multidimensional arraysPassing external variables
Use the -v option or assign variables after the script:
VAR=10000
echo | awk -v VARIABLE=$VAR '{ print VARIABLE }' var1="aaa"
var2="bbb"
echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2Operators
Arithmetic: +, -, *, /, %, ++, -- Assignment: =, +=, -=, *=, /=, %=, **= Logical: ||, && Relational: <, <=, >, >=, !=, == Regex:
~, !~Advanced I/O
nextskips the current record and proceeds to the next line.
awk 'NR%2==1{next}{print NR,$0}' text.txt getlinereads a line from a file, pipe, or command and assigns it to a variable, updating NF, NR, FNR. It returns 1 on success, 0 on EOF, and –1 on error.
File operations
open("filename")– open a file. close("filename") – close a file.
Redirect output with > or >> inside printf or print.
Loop structures
for (var in array) { statements }
for (init; condition; increment) { statements }
while (condition) { statements }
do { statements } while (condition)Other statements
break– exit a loop. continue – skip to the next iteration. next – read the next input line. exit – exit the main input loop and run END block.
Examples
Print each line of /etc/passwd: awk '{print}' /etc/passwd Print the first field using colon as separator: awk -F ':' '{print $1}' /etc/passwd Count blank lines:
awk 'BEGIN{X=0} /^$/{ X+=1 } END{print "I find",X,"blank lines."}' fileCalculate total file size:
ls -l | awk 'BEGIN{sum=0} !/^d/{sum+=$5} END{print "total size is",sum}'Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
