Fundamentals 11 min read

Master awk: From Basics to Advanced Text Processing on Linux

This comprehensive guide introduces awk as a powerful Linux text‑analysis tool, explains its underlying record‑field model, demonstrates practical one‑liners and scripts for reporting, filtering, formatting, file splitting, and shows advanced features like built‑in variables, conditionals, arrays, and string functions.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Master awk: From Basics to Advanced Text Processing on Linux

Introduction

Among the Linux "three musketeers" (grep, sed, awk), this article focuses on awk, a powerful text‑analysis utility that excels at data extraction, reporting, and log processing.

What awk can do

Generate formatted reports from arbitrary text.

Analyze system logs and extract specific fields.

Count occurrences such as website visits or IP addresses.

Combine with other tools to summarize system status.

Provide scripting capabilities like loops, conditionals, and arrays.

How awk works

Awk processes input in three stages: BEGIN – executes initialization code before any input. body – runs for each record (default line, split by \n) and operates on fields (columns) separated by the field separator. END – runs after all input has been processed, often to print final results.

Each line is a record and each column is a field . The special variables $0, $1$N refer to the whole line and individual fields respectively.

Practical basics

Save sample data to file.txt. A simple one‑liner prints columns 1, 4, and 8: awk '{print $1, $4, $8}' file.txt Awk scripts must be enclosed in single quotes. $0 represents the entire line, while $1..$N refer to specific columns.

Awk’s printf mirrors C’s formatting, using placeholders like %s for strings, -4 for left‑aligned width 4, etc.

Advanced usage

Filtering records – use comparison operators ( !=, >, <, >=, <=) and the built‑in variable $0 to select lines. Example: output lines where column 3 equals root and column 6 equals 10.

Built‑in variables – NR (current line number), NF (number of fields), RS (record separator, default \n), FS (field separator, default space/tab), OFS (output field separator), ORS (output record separator).

Specifying separators – set FS in the script or use the -F option. Multiple separators can be expressed as a regular expression, e.g. -F '[;:]'. The output separator is controlled by OFS.

File splitting – redirect output with > to create separate files, for example splitting a log by month (field 5) into different files.

Conditional statements – embed if blocks inside the action braces to perform complex logic.

Aggregations – accumulate values in variables (e.g., sum) during the body phase and print the total in END. Example: sum the sizes of all *.c and *.h files.

String handling – awk provides functions such as length(s), toupper(s), tolower(s), index(s,t), split(s,a,sep), substr(s,p,n), etc.

Advanced script example

A more complex task (student grade report) is placed in a script file cal.awk and executed with awk -f cal.awk file.txt. The script demonstrates:

Initialisation and header printing in BEGIN.

Per‑record calculations in body (summing scores per student and subject).

Final summary and average calculation in END.

Summary

Key concepts covered include the record/field model, essential built‑in variables ( NR, NF, RS, FS, OFS, ORS), the three‑phase execution model ( BEGIN, body, END), formatting with printf, conditionals, loops, arrays, file redirection, and a suite of string functions, all of which empower users to perform sophisticated text processing directly from the command line.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data analysisLinuxcommand-lineUnixtext processingShell scriptingawk
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.