Master awk: Essential Commands and Advanced Techniques for Text Processing
This article provides a comprehensive introduction to awk, covering its command syntax, built‑in variables, pattern‑action processing, external variable passing, arithmetic and logical operations, advanced I/O, control flow constructs, and practical examples for effective text and data manipulation on Unix/Linux systems.
Introduction
awk is a programming language for processing text and data on Linux/Unix. It can read from stdin, files, or other commands, supports user‑defined functions and dynamic regular expressions, and works both interactively and as scripts.
awk command format
<code>awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)</code>Common options:
-F fsto set input field separator,
-v var=valueto assign external variable,
-f scriptfileto read script from a file.
The script consists of pattern and action.
Pattern types
Regular expression
Relational expression
Pattern‑matching expression using
~or
!~BEGIN, pattern, END blocks
Action
Actions are one or more commands, functions, or expressions separated by newlines or semicolons and enclosed in braces. Typical actions include variable/array assignments, print statements, built‑in functions, and control‑flow statements.
Basic awk script format
<code>awk 'BEGIN{ commands } pattern{ commands } END{ commands }' file</code>BEGIN, pattern, and END blocks are optional; the script is usually quoted with single or double quotes.
Execution process
Execute the BEGIN block (initialization, header printing).
Read each line, then execute the pattern block (or default
{print}if omitted).
After the last line, execute the END block (summary, final output).
Built‑in variables
<code>$n # nth field of the current record
$0 # entire current line
ARGC # number of command‑line arguments
ARGIND # index of the current file
ARGV # array of command‑line arguments
CONVFMT # numeric conversion format
ENVIRON # environment variables array
ERRNO # description of the last system error
FIELDWIDTHS # list of field widths
FILENAME # name of the current input file
NR # total record number (line number)
FNR # record number in the current file
FS # input field separator (default whitespace)
IGNORECASE # case‑insensitive matching when true
NF # number of fields in the current record
OFMT # output format for numbers
OFS # output field separator
ORS # output record separator
RS # input record separator
RSTART # start position of the last match
RLENGTH # length of the last match
SUBSEP # subscript separator for multidimensional arrays</code>Passing external variables
Use the
-voption or assign variables after the script:
<code>VAR=10000
echo | awk -v VARIABLE=$VAR '{ print VARIABLE }'</code> <code>var1="aaa"
var2="bbb"
echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2</code>Operators
Arithmetic:
+, -, *, /, %, ++, --Assignment:
=, +=, -=, *=, /=, %=, **=Logical:
||, &&Relational:
<, <=, >, >=, !=, ==Regex:
~, !~Advanced I/O
nextskips the current record and proceeds to the next line.
<code>awk 'NR%2==1{next}{print NR,$0}' text.txt</code> getlinereads a line from a file, pipe, or command and assigns it to a variable, updating
NF, NR, FNR. It returns 1 on success, 0 on EOF, and –1 on error.
File operations
open("filename")– open a file.
close("filename")– close a file.
Redirect output with
>or
>>inside
printfor
print.
Loop structures
<code>for (var in array) { statements }
for (init; condition; increment) { statements }
while (condition) { statements }
do { statements } while (condition)</code>Other statements
break– exit a loop.
continue– skip to the next iteration.
next– read the next input line.
exit– exit the main input loop and run END block.
Examples
Print each line of
/etc/passwd:
<code>awk '{print}' /etc/passwd</code>Print the first field using colon as separator:
<code>awk -F ':' '{print $1}' /etc/passwd</code>Count blank lines:
<code>awk 'BEGIN{X=0} /^$/{ X+=1 } END{print "I find",X,"blank lines."}' file</code>Calculate total file size:
<code>ls -l | awk 'BEGIN{sum=0} !/^d/{sum+=$5} END{print "total size is",sum}'</code>Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.