Fundamentals 14 min read

Mastering AWK: Powerful Text Processing and Reporting Techniques

AWK is a versatile text-processing language that reads files line by line, splits fields by default spaces, and applies pattern-action rules to analyze data, generate reports, and perform complex tasks such as extracting columns, counting records, and scripting with built-in variables, loops, and conditional statements.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering AWK: Powerful Text Processing and Reporting Techniques
AWK illustration
AWK illustration

Introduction

awk is a powerful text‑analysis tool. Compared with grep for searching and sed for editing, awk excels at analyzing data and generating reports. In simple terms, awk reads a file line by line, uses a space as the default field separator, slices each line, and then processes the slices.

There are three versions of awk: awk, nawk, and gawk. Unless otherwise specified, gawk (the GNU implementation) is implied.

The name AWK comes from the initials of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK has its own programming language, defined by its creators as a "pattern scanning and processing language". It lets you write short programs that read input files, sort data, perform calculations, and generate reports, among many other functions.

Usage

awk '{pattern + action}' filenames

The syntax consists of a pattern that specifies what to search for and an action that is executed when the pattern matches. Patterns are regular expressions enclosed in slashes. The action is a series of commands grouped by curly braces.

Awk operates on a file line as a record. For each line it receives, it executes the associated commands to process the text.

Calling awk

There are three ways to invoke awk:

Command‑line mode: awk [-F field_separator] 'commands' input_file Shell script mode: place awk commands in a script file and start the script with #!/bin/awk (or #!/bin/sh with awk invoked inside).

Separate script file: awk -f script_file input_file This chapter focuses on the command‑line method.

Introductory Examples

Assume the output of last -n 5 is:

# last -n 5 (only the first five lines) root pts/1 192.168.1.100 Tue Feb 10 21:00 still logged in ...

To display only the usernames of the last five logins:

last -n 5 | awk '{print $1}'

Awk’s workflow: read a record separated by newline, split it into fields using the field separator (default is whitespace), then $0 represents the whole record, $1 the first field, $n the nth field.

To list accounts from /etc/passwd:

cat /etc/passwd | awk -F ':' '{print $1}'

To list accounts and their shells separated by a tab:

cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'

To list accounts and shells separated by a comma and add a header and a custom line at the end:

cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'

Awk also supports pattern matching. For example, to find lines containing "root" in /etc/passwd:

awk -F: '/root/' /etc/passwd

To print only the shell field for lines matching "root":

awk -F: '/root/{print $7}' /etc/passwd

Built‑in Variables

ARGC – number of command‑line arguments

ARGV – array of command‑line arguments

ENVIRON – environment variables

FILENAME – name of the current file

FNR – record number in the current file

FS – input field separator (same as -F)

NF – number of fields in the current record

NR – total number of records read so far

OFS – output field separator

ORS – output record separator

RS – input record separator

Additionally, $0 is the entire record, $1 the first field, $2 the second, and so on.

Print and printf

Awk provides both print and printf. print outputs its arguments separated by the output field separator, while printf follows the C‑style formatting, offering more control over the output.

Awk Programming

Variables can be built‑in or user‑defined. Example to count the number of accounts in /etc/passwd:

awk '{count++;} END{print "user count is", count}' /etc/passwd

It is good practice to initialize variables:

awk 'BEGIN {count=0} {count++;} END{print "user count is", count}' /etc/passwd

Awk can also be used to sum file sizes:

ls -l | awk 'BEGIN {size=0;} {size+= $5;} END{print "size is", size/1024/1024, "M"}'

Conditional statements (if, else) and loops (while, for, break, continue) follow C‑language syntax.

Arrays

Awk arrays are associative; indices can be strings or numbers. Example to collect usernames from /etc/passwd and print them after processing:

awk -F ':' 'BEGIN {count=0;} {name[count]=$1; count++;} END{for(i=0;i<count;i++) print i, name[i]}' /etc/passwd

Awk’s extensive capabilities make it suitable for quick data extraction, reporting, and scripting tasks on Linux systems.

For more details, refer to the GNU Awk manual: http://www.gnu.org/software/gawk/manual/gawk.html

Awk example
Awk example
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data analysisLinuxcommand-linetext processingShell scriptingawk
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.