Mastering AWK: Powerful Text Processing and Reporting Techniques
AWK is a versatile text-processing language that reads files line by line, splits fields by default spaces, and applies pattern-action rules to analyze data, generate reports, and perform complex tasks such as extracting columns, counting records, and scripting with built-in variables, loops, and conditional statements.
Introduction
awk is a powerful text‑analysis tool. Compared with grep for searching and sed for editing, awk excels at analyzing data and generating reports. In simple terms, awk reads a file line by line, uses a space as the default field separator, slices each line, and then processes the slices.
There are three versions of awk: awk, nawk, and gawk. Unless otherwise specified, gawk (the GNU implementation) is implied.
The name AWK comes from the initials of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK has its own programming language, defined by its creators as a "pattern scanning and processing language". It lets you write short programs that read input files, sort data, perform calculations, and generate reports, among many other functions.
Usage
awk '{pattern + action}' filenames
The syntax consists of a pattern that specifies what to search for and an action that is executed when the pattern matches. Patterns are regular expressions enclosed in slashes. The action is a series of commands grouped by curly braces.
Awk operates on a file line as a record. For each line it receives, it executes the associated commands to process the text.
Calling awk
There are three ways to invoke awk:
Command‑line mode: awk [-F field_separator] 'commands' input_file Shell script mode: place awk commands in a script file and start the script with #!/bin/awk (or #!/bin/sh with awk invoked inside).
Separate script file: awk -f script_file input_file This chapter focuses on the command‑line method.
Introductory Examples
Assume the output of last -n 5 is:
# last -n 5 (only the first five lines) root pts/1 192.168.1.100 Tue Feb 10 21:00 still logged in ...
To display only the usernames of the last five logins:
last -n 5 | awk '{print $1}'
Awk’s workflow: read a record separated by newline, split it into fields using the field separator (default is whitespace), then $0 represents the whole record, $1 the first field, $n the nth field.
To list accounts from /etc/passwd:
cat /etc/passwd | awk -F ':' '{print $1}'
To list accounts and their shells separated by a tab:
cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'
To list accounts and shells separated by a comma and add a header and a custom line at the end:
cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
Awk also supports pattern matching. For example, to find lines containing "root" in /etc/passwd:
awk -F: '/root/' /etc/passwd
To print only the shell field for lines matching "root":
awk -F: '/root/{print $7}' /etc/passwd
Built‑in Variables
ARGC – number of command‑line arguments
ARGV – array of command‑line arguments
ENVIRON – environment variables
FILENAME – name of the current file
FNR – record number in the current file
FS – input field separator (same as -F)
NF – number of fields in the current record
NR – total number of records read so far
OFS – output field separator
ORS – output record separator
RS – input record separator
Additionally, $0 is the entire record, $1 the first field, $2 the second, and so on.
Print and printf
Awk provides both print and printf. print outputs its arguments separated by the output field separator, while printf follows the C‑style formatting, offering more control over the output.
Awk Programming
Variables can be built‑in or user‑defined. Example to count the number of accounts in /etc/passwd:
awk '{count++;} END{print "user count is", count}' /etc/passwd
It is good practice to initialize variables:
awk 'BEGIN {count=0} {count++;} END{print "user count is", count}' /etc/passwd
Awk can also be used to sum file sizes:
ls -l | awk 'BEGIN {size=0;} {size+= $5;} END{print "size is", size/1024/1024, "M"}'
Conditional statements (if, else) and loops (while, for, break, continue) follow C‑language syntax.
Arrays
Awk arrays are associative; indices can be strings or numbers. Example to collect usernames from /etc/passwd and print them after processing:
awk -F ':' 'BEGIN {count=0;} {name[count]=$1; count++;} END{for(i=0;i<count;i++) print i, name[i]}' /etc/passwd
Awk’s extensive capabilities make it suitable for quick data extraction, reporting, and scripting tasks on Linux systems.
For more details, refer to the GNU Awk manual: http://www.gnu.org/software/gawk/manual/gawk.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
