Operations 13 min read

Mastering AWK: Powerful Text Processing Techniques for Linux

This article provides a comprehensive introduction to AWK, covering its origins, basic syntax, command‑line usage, built‑in variables, printing functions, programming constructs such as conditionals, loops, and arrays, and includes numerous practical examples for extracting and formatting data on Linux systems.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering AWK: Powerful Text Processing Techniques for Linux

Introduction

awk is a powerful text‑analysis tool; compared with grep for searching and sed for editing, awk excels at data analysis and report generation. It reads files line by line, splits each line on whitespace (default delimiter), and then processes the fields.

There are three versions of awk: awk, nawk, and gawk. Unless otherwise specified, gawk (the GNU implementation) is assumed.

The name AWK comes from the initials of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK has its own language, the AWK Programming Language, defined as a "pattern‑scanning and processing language" that can read input files, sort and compute data, and generate reports.

Usage

awk '{pattern + action}' filenames

In the syntax, pattern specifies what to search for in the data, and action is the series of commands executed when a match is found. Braces are optional but group commands for a given pattern. Patterns are regular expressions enclosed in slashes.

Awk’s basic function is to browse and extract information from files or strings based on specified rules; the extracted data can then be further processed. A complete awk script typically formats information from text files.

Awk processes input one line at a time, applying the field separator (default is whitespace) to split each line into fields. $0 represents the entire line, $1 the first field, $n the nth field.

Invoking awk

There are three ways to invoke awk:

1. Command‑line mode awk [-F separator] 'commands' input‑file(s) Here -F specifies the field separator (optional). commands are the awk statements to execute. 2. Shell script mode #!/bin/sh can be replaced with #!/bin/awk to run an awk script directly. 3. Separate script file awk -f script‑file input‑file(s)

The rest of this article focuses on the command‑line mode.

Getting Started Example

Assuming the output of last -n 5 is:

root pts/ 1 192.168.1.100 Tue Feb 10 21:00 still logged in root pts/ 1 192.168.1.100 Tue Feb 10 00:46:02 ...

To display only the first column (user names):

last -n 5 | awk '{print $1}'

To list accounts from /etc/passwd:

cat /etc/passwd | awk -F ':' '{print $1}'

To list accounts and their shells, separated by a tab:

cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'

To list accounts and shells separated by a comma, with a header and a custom line at the end:

cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'

AWK Built‑in Variables

ARGC – number of command‑line arguments ARGV – array of command‑line arguments ENVIRON – associative array of environment variables FILENAME – name of the current input file FNR – record number in the current file FS – input field separator (same as -F ) NF – number of fields in the current record NR – total number of records read so far OFS – output field separator ORS – output record separator RS – input record separator

$0 represents the whole record; $1, $2, … represent individual fields.

print and printf

Awk provides two output functions:

print – prints variables, numbers, or strings separated by commas (or spaces if no commas are used).

printf – works like C’s printf, allowing formatted output and is often clearer for complex printing.

AWK Programming

Variables can be built‑in or user‑defined. Example to count the number of accounts in /etc/passwd:

awk '{count++; print $0;} END {print "user count is", count}' /etc/passwd

It is good practice to initialize variables:

awk 'BEGIN {count=0} {count++;} END {print "user count is", count}' /etc/passwd

Example to compute total file size from ls -l output:

ls -l | awk 'BEGIN {size=0} {size+= $5} END {print "[end] size is", size/1024/1024, "M"}'

Filtering out directory entries (size 4096):

ls -l | awk 'BEGIN {size=0} {if($5!=4096){size+=$5}} END {print "[end] size is", size/1024/1024, "M"}'

Conditional Statements

Awk’s conditionals follow C syntax: if (expression) { statements; } else { statements; } Nested if…else and else if constructs are also supported.

Loop Statements

Awk supports while, do/while, for, break, and continue with the same semantics as C.

Arrays

Awk arrays are associative; indices (keys) can be strings or numbers. They are created automatically when first used. Example to list all usernames from /etc/passwd:

awk -F ':' '{name[NR] = $1} END {for (i=1; i<=NR; i++) print i, name[i]}' /etc/passwd

Awk’s extensive capabilities make it suitable for quick data extraction, reporting, and simple scripting tasks on Unix‑like systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

command-linepattern-matchingtext processingShell scriptingawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.