Master awk: From Basics to Advanced Text Processing with Real Examples
This article provides a comprehensive guide to awk, covering its origins, syntax, options, keywords, operators, built‑in variables, regular‑expression meta‑characters, functions, control flow, system interaction, and practical examples that help readers efficiently process and analyze text data on the command line.
Introduction
awk is a powerful text‑analysis tool whose name comes from the initials of its creators Alfred Aho, Peter Weinberger and Brian Kernighan. It provides its own programming language for scanning and processing data, allowing you to read files, sort, compute, and generate reports.
Syntax Structure
The basic command line form is awk [options] script file. Important options include: -F to set the field separator. -v to assign a variable. -f to read the program from a file.
Keywords
Two special pattern blocks are BEGIN (executed once before processing) and END (executed after all input has been processed).
Test Data
A sample data file mi_info is used throughout the article; lines beginning with # are comments and not part of the input.
Basic Examples
Simple one‑liner commands demonstrate pattern matching, field selection and printing, for example:
awk '/2499/' mi_info awk '$5=="256G"' mi_info awk '$1 ~ "note" {print}' mi_infoUsing -f and -v
Complex scripts can be stored in a file and invoked with -f. Variables can be passed from the command line with -v:
awk -v test="price is" '/note/ {print $1, test, $NF}' mi_infoOperators (precedence high to low)
++ -- (increment/decrement)
^ ** (exponentiation, right‑associative)
! + - (logical NOT, unary plus/minus)
* / % (multiply, divide, modulo)
+ - (addition, subtraction)
< <= == != > >= (comparisons)
&& (logical AND)
|| (logical OR)
?: (ternary conditional)
= += -= *= /= %= ^= **= (assignment, right‑associative)
Built‑in Variables
$n– nth field, $0 – whole record. ARGC, ARGV – command‑line argument count and array. FILENAME, FS, OFS, RS, ORS – file and record separators. NF, NR, FNR – field count, record number, file‑relative record number. IGNORECASE – toggle case‑insensitive matching.
Regular‑Expression Metacharacters
^ (start of line), $ (end of line), . (any character), * + ? (quantifiers), [] (character class), [^] (negated class), | (alternation), () (grouping).
Built‑in Functions
sub(), gsub() – substitution. index(), length(), substr(), match(), split() – string handling.
Arithmetic functions: atan2(), cos(), exp(), log(), sin(), sqrt(), int(), rand(), srand().
Control Flow
Awk supports if … else, while, for, break, continue, next (skip to next record) and exit (terminate processing, optional status).
System Interaction
Commands such as redirection ( >>), pipelines ( |), the system() function and printf allow awk scripts to interact with the shell and format output.
Summary and Best Practices
For complex logic, place the script in a file and invoke it with -f to avoid typing errors.
Leverage built‑in variables and system calls to keep scripts concise.
When processing large files, review and optimise the script to improve performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
