Mastering gawk: Powerful Text Processing on Unix/Linux
This article introduces gawk, the GNU version of awk, explaining its programming capabilities, command syntax, field handling, script execution methods, and how to use BEGIN and END blocks for pre‑ and post‑processing of data streams on Unix/Linux systems.
gawk is the GNU implementation of the original awk program in Unix, offering a full programming language for stream editing. It allows defining variables, using arithmetic and string operators, employing structured programming constructs, and extracting and reformatting data from files, such as generating formatted reports from log files.
1 gawk command syntax
<code>gawk option program file
-F fs specify field separator
-f file read program from file
-v var=value define variable with default value
-mf N set maximum number of fields
-mr N set maximum number of records
-W keyword set compatibility mode or warning level
</code>Command‑line options customize gawk’s behavior. Scripts can be written to read each line, process data, and produce any type of output.
2 Reading a program script from the command line
gawk scripts must be enclosed in braces {} and quoted with single quotes. Example:
<code># gawk '{print "Hello World!"}'</code>Without a file name, gawk reads from STDIN and waits for input. Press Ctrl‑D to send EOF and terminate.
3 Using field variables
gawk automatically assigns variables $0, $1, … $n to each field in a line, using whitespace as the default separator. The -F option changes the separator, e.g., to ':' for /etc/passwd.
<code># gawk -F : '{print $1}' /etc/passwd
root
bin
daemon
…</code>4 Multiple commands in a script
Separate commands with semicolons. Example:
<code>echo "My name is centos" | gawk '{$4="hahaha"; print $0}'
My name is hahaha</code>5 Storing scripts in files
Scripts can be saved in a file and invoked with -f. Example script2.gawk prints the user’s home directory:
<code>{print $1 "'s home directory is " $6}</code>Running
gawk -F: -f script2.gawk /etc/passwdproduces the desired output.
6 Running code before processing data
The BEGIN block executes before any input is read, useful for printing headers.
<code>gawk 'BEGIN{print "The data3 File contents:"}{print $0}' data3.txt</code>7 Running code after processing data
The END block runs after all input is processed, ideal for footers.
<code>gawk '{print $0} END{print "End of file"}' data3.txt</code>Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.