Fundamentals 12 min read

Mastering Awk: Powerful Text Processing for Linux with Real‑World Examples

This tutorial introduces awk as a versatile Linux text‑analysis tool, explains its execution model (BEGIN, body, END), demonstrates practical commands for reporting, filtering, formatting, and advanced scripting, and provides numerous code snippets and visual examples to help readers quickly apply awk in real‑world scenarios.

Efficient Ops

Mar 30, 2021

Mastering Awk: Powerful Text Processing for Linux with Real‑World Examples

What Awk Can Do

Awk is a powerful text‑analysis utility that excels at generating reports, parsing system logs, counting data such as website visits, aggregating system information, and supporting loops, conditionals, and arrays for complex data processing.

Awk Execution Model

Awk processes input in three stages: BEGIN – runs commands before any input is read. body – executed for each line (record) of input. END – runs after all input has been processed.

Each line is split into fields (columns) using a delimiter (default whitespace). The record separator is \n.

Basic Command Syntax

The basic awk command format is illustrated below:

Awk scripts are enclosed in single quotes. $1..$N refers to specific columns, while $0 represents the entire line.

Practical – Beginner

Save sample data to file.txt and run a simple awk command to print columns 1, 4, and 8:

Fields are accessed with $1, $4, etc. Awk’s printf supports C‑style formatting (e.g., %s for strings, -4 for left‑aligned width 4).

Practical – Intermediate

Filtering Records – output only lines where column 3 equals root and column 6 equals 10:

Awk supports comparison operators !=, >, <, >=, <=. $0 denotes the whole line.

Built‑in Variables – NR (current record number) and NF (field count) are useful for tracking line numbers and column counts.

Specifying Delimiters – change the input field separator with FS or the -F option, and set the output field separator with OFS:

Practical – Advanced

Conditional Matching – list all files owned by root or match lines containing root using regular expressions ( /root/) or multiple patterns ( /Aug|Dec/).

Splitting Files – redirect output to separate files based on a field (e.g., month in column 5) using the > operator.

If Statements – complex conditions are placed inside braces; remember if must be inside the {} block.

Statistics – sum file sizes of *.c and *.h files, or compute per‑user memory usage from the RSS column using arrays and for loops.

Comprehensive Example – Student Grades

A full‑featured awk script ( cal.awk) processes a grade file, using BEGIN to print headers, body to accumulate scores, and END to output totals and averages.

Key Built‑in Variables

NR

: current line number. NF: number of fields in the current line. RS: record separator (default newline). FS: field separator (default space/tab). OFS: output field separator (default space). ORS: output record separator (default newline).

Formatting Output

Use printf with familiar C format specifiers ( %d, %u, %f, %s, %c, %e, %x, %g, \n, \t).

Programming Constructs

Conditional statements (if/else).

Loops (while, for).

Arrays (associative, similar to maps).

Functions (built‑in and user‑defined).

Common String Functions

index(s, t)

: position of substring t in s. length(s): length of s. split(s, a, sep): split s into array a using sep. substr(s, p, n): substring of s starting at p with length n. tolower(s) / toupper(s): case conversion.

This guide provides a concise yet comprehensive overview of awk’s core concepts, syntax, and practical usage, enabling readers to harness awk for efficient text processing and data analysis on Linux.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data analysis Scripting text processing command-line awk

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.