Mastering Awk: Powerful Text‑Processing Techniques for Linux
This guide introduces the AWK text‑processing language, explains its syntax, shows three ways to invoke it, and provides practical examples—from extracting fields in /etc/passwd to counting users, computing file sizes, and using built‑in variables, conditionals, loops, and arrays for advanced scripting.
Introduction
Awk is a versatile text‑analysis tool that reads input line‑by‑line, splits each line into fields (default separator is whitespace), and allows you to perform calculations, filtering, and reporting. The GNU implementation is called gawk, and the name derives from the initials of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan.
How to Use Awk
The basic syntax is awk '{pattern + action}' {filenames}. The pattern is a regular expression that selects lines, and the action is a series of commands executed for each matching line. If no pattern is given, the action runs for every line.
Calling Awk
There are three common ways to run Awk:
Command‑line : awk [-F field‑separator] 'commands' input‑file(s) Shell script : place Awk commands in a script file and start it with #!/bin/awk (or #!/bin/sh if the script contains other shell commands).
File argument : awk -f awk‑script‑file input‑file(s) where -f loads the script from a separate file.
Basic Examples
Extract the first column (user name) from the last five login entries:
# last -n 5 | awk '{print $1}'
List all usernames from /etc/passwd:
# cat /etc/passwd | awk -F ':' '{print $1}'
Show usernames with their shells, separated by a tab:
# cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'
Add a header line and a custom footer row:
# cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
Built‑in Variables
Awk provides many built‑in variables, such as: ARGC – number of command‑line arguments ARGV – array of command‑line arguments ENVIRON – environment variables FILENAME – current file name FNR – record number in the current file FS – input field separator (same as -F) NF – number of fields in the current record NR – total record number across all input files OFS – output field separator ORS – output record separator
Additionally, $0 holds the entire line, $1 the first field, and so on.
Printing Functions
Awk offers two output functions: print – simple output, arguments separated by commas (default separator is a space). printf – C‑style formatted output, useful for aligning columns or complex strings.
Awk Programming Constructs
Beyond one‑liner commands, Awk supports full programming features:
Variables and assignment : custom variables can be created and initialized, e.g., count=0.
Conditionals (C‑style):
if (expression) { statement; } else { statement; }Loops – while, do/while, for, with break and continue available.
Arrays – associative arrays indexed by strings or numbers; useful for counting, summing, or storing intermediate results.
Example: count users in /etc/passwd:
awk '{count++; print $0} END {print "user count is", count}' /etc/passwd
Example: compute total size of files listed by ls -l (excluding directories of size 4096 bytes) and display the result in megabytes:
ls -l | awk 'BEGIN {size=0} {if($5!=4096){size+= $5}} END {print "[end]size is", size/1024/1024, "M"}'
These snippets illustrate how Awk can be used for quick data extraction, reporting, and even more complex scripting tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
