Mastering AWK: Powerful Text Processing Techniques for Linux
This article provides a comprehensive introduction to AWK, covering its origins, basic syntax, command‑line usage, built‑in variables, printing functions, programming constructs such as conditionals, loops, and arrays, and includes numerous practical examples for extracting and formatting data on Linux systems.
Introduction
awk is a powerful text‑analysis tool; compared with grep for searching and sed for editing, awk excels at data analysis and report generation. It reads files line by line, splits each line on whitespace (default delimiter), and then processes the fields.
There are three versions of awk: awk, nawk, and gawk. Unless otherwise specified, gawk (the GNU implementation) is assumed.
The name AWK comes from the initials of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK has its own language, the AWK Programming Language, defined as a "pattern‑scanning and processing language" that can read input files, sort and compute data, and generate reports.
Usage
awk '{pattern + action}' filenames
In the syntax, pattern specifies what to search for in the data, and action is the series of commands executed when a match is found. Braces are optional but group commands for a given pattern. Patterns are regular expressions enclosed in slashes.
Awk’s basic function is to browse and extract information from files or strings based on specified rules; the extracted data can then be further processed. A complete awk script typically formats information from text files.
Awk processes input one line at a time, applying the field separator (default is whitespace) to split each line into fields. $0 represents the entire line, $1 the first field, $n the nth field.
Invoking awk
There are three ways to invoke awk:
1. Command‑line mode awk [-F separator] 'commands' input‑file(s) Here -F specifies the field separator (optional). commands are the awk statements to execute. 2. Shell script mode #!/bin/sh can be replaced with #!/bin/awk to run an awk script directly. 3. Separate script file awk -f script‑file input‑file(s)
The rest of this article focuses on the command‑line mode.
Getting Started Example
Assuming the output of last -n 5 is:
root pts/ 1 192.168.1.100 Tue Feb 10 21:00 still logged in root pts/ 1 192.168.1.100 Tue Feb 10 00:46:02 ...
To display only the first column (user names):
last -n 5 | awk '{print $1}'
To list accounts from /etc/passwd:
cat /etc/passwd | awk -F ':' '{print $1}'
To list accounts and their shells, separated by a tab:
cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'
To list accounts and shells separated by a comma, with a header and a custom line at the end:
cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
AWK Built‑in Variables
ARGC – number of command‑line arguments ARGV – array of command‑line arguments ENVIRON – associative array of environment variables FILENAME – name of the current input file FNR – record number in the current file FS – input field separator (same as -F ) NF – number of fields in the current record NR – total number of records read so far OFS – output field separator ORS – output record separator RS – input record separator
$0 represents the whole record; $1, $2, … represent individual fields.
print and printf
Awk provides two output functions:
print – prints variables, numbers, or strings separated by commas (or spaces if no commas are used).
printf – works like C’s printf, allowing formatted output and is often clearer for complex printing.
AWK Programming
Variables can be built‑in or user‑defined. Example to count the number of accounts in /etc/passwd:
awk '{count++; print $0;} END {print "user count is", count}' /etc/passwd
It is good practice to initialize variables:
awk 'BEGIN {count=0} {count++;} END {print "user count is", count}' /etc/passwd
Example to compute total file size from ls -l output:
ls -l | awk 'BEGIN {size=0} {size+= $5} END {print "[end] size is", size/1024/1024, "M"}'
Filtering out directory entries (size 4096):
ls -l | awk 'BEGIN {size=0} {if($5!=4096){size+=$5}} END {print "[end] size is", size/1024/1024, "M"}'
Conditional Statements
Awk’s conditionals follow C syntax: if (expression) { statements; } else { statements; } Nested if…else and else if constructs are also supported.
Loop Statements
Awk supports while, do/while, for, break, and continue with the same semantics as C.
Arrays
Awk arrays are associative; indices (keys) can be strings or numbers. They are created automatically when first used. Example to list all usernames from /etc/passwd:
awk -F ':' '{name[NR] = $1} END {for (i=1; i<=NR; i++) print i, name[i]}' /etc/passwd
Awk’s extensive capabilities make it suitable for quick data extraction, reporting, and simple scripting tasks on Unix‑like systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
