Master gawk: Powerful Text Processing on Unix with Real-World Examples
This guide explains how to use the GNU awk (gawk) tool for defining variables, performing arithmetic and string operations, applying structured programming concepts, extracting and formatting data from text files, and generating reports, with clear command‑line examples and script techniques.
gawk is the GNU implementation of the original Unix awk program. It elevates stream editing by providing a full programming language, allowing you to define variables, use arithmetic and string operators, apply structured programming constructs, and generate formatted reports from large text files such as logs.
1 gawk command format
gawk option program file
-F fs specify field separator
-f file read program from file
-v var=value define a variable with a default value
-mf N set maximum number of fields
-mr N set maximum number of records
-W keyword set compatibility mode or warning levelCommand‑line options let you customize gawk’s behavior. The real power lies in writing scripts that read input lines, process them, and produce any desired output.
2 Reading program scripts from the command line
gawk scripts are enclosed in curly braces and must be quoted with single quotes. Example: # gawk '{print "Hello World!"}' Without a file name, gawk reads from STDIN and waits for input. After you type a line and press Enter, gawk processes that line according to the script.
3 Using data field variables
gawk automatically assigns variables to each field in a line. By default:
$0 whole line
$1 first field
$n nth fieldFields are split by the field separator, which defaults to whitespace. You can change it with -F. For example, using a colon to parse /etc/passwd:
# gawk -F : '{print $1}' /etc/passwd
root
bin
daemon
...4 Using multiple commands in a script
Separate commands with semicolons. Example:
# echo "My name is centos" | gawk '{ $4="hahaha"; print $0 }'
My name is hahaha5 Reading programs from files
Store a gawk program in a file and invoke it with -f. Example script script2.gawk:
{print $1 "'s home directory is " $6} # gawk -F: -f script2.gawk /etc/passwd
root's home directory is /root
bin's home directory is /bin
...You can place multiple commands on separate lines in the script file without needing semicolons.
6 Running a script before processing data (BEGIN)
The BEGIN block executes before any input is read. Example:
# gawk 'BEGIN{print "The data3 File contents:"}{print $0}' data3.txt
The data3 File contents:
Line 1
Line 2
Line 37 Running a script after processing data (END)
The END block runs after all input has been processed. Example:
# gawk '{print $0} END{print "End of file"}' data3.txt
Line 1
Line 2
Line 3
End of fileUsing END is ideal for adding footers or final summaries to reports.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
