Fundamentals 16 min read

Master Linux Text Processing: Find, Grep, Sed, Awk, and More

This guide provides a comprehensive overview of essential Linux command‑line utilities for text manipulation—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—detailing common options, practical examples, and advanced techniques for searching, filtering, transforming, and processing files efficiently.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Master Linux Text Processing: Find, Grep, Sed, Awk, and More

This article introduces the most commonly used Linux shell tools for text processing, offering practical examples and parameters for each command.

01 find – File Searching

Basic file search by name and pattern:

find . ( -name "*.txt" -o -name "*.pdf" ) -print

Search using regular expressions (case‑insensitive with -iregex): find . -regex ".*(.txt|.pdf)$" Negate a pattern: find . ! -name "*.txt" -print Limit search depth (depth = 1): find . -maxdepth 1 -type f Custom searches:

By type: find . -type d -print # list directories only By time:

find . -atime 7 -type f -print   # files accessed in the last 7 days

By size: find . -type f -size +2k By permission: find . -type f -perm 644 -print By user: find . -type f -user weber -print Post‑search actions:

Delete files (e.g., all *.swp files): find . -type f -name "*.swp" -delete Execute a command on each match:

find . -type f -user root -exec chown weber {} \;

Copy recent files:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Delimiter control: default space, use -print0 for NUL‑delimited output.

02 grep – Text Searching

Basic usage: grep match_pattern file Common options: -o output only the matching part (vs -v for non‑matches) -c count matches -n show line numbers -i ignore case -l list matching file names

Recursive search (programmers' favorite): grep "class" . -R -n Search multiple patterns: grep -e "class" -e "virtual" file Use -z to treat NUL as line delimiter (useful for deleting files with spaces in names):

grep "test" file* -lZ | xargs -0 rm

03 xargs – Argument Construction

xargs

converts input data into command‑line arguments, allowing powerful combinations with other commands such as grep and find.

Simple conversion: cat file.txt | xargs Specify delimiter: cat file.txt | xargs -d ";" Replace placeholder with -I {}:

cat file.txt | xargs -I {} ./command.sh -p {} -1

Null‑delimited input: cat file.txt | xargs -0 Limit arguments per command:

cat file.txt | xargs -n 3

04 sort – Sorting

Key options: -n numeric sort, -d dictionary order -r reverse order -k N sort by the N‑th column

Example: sort -nrk 1 data.txt Ignore leading blanks:

sort -bd data

05 uniq – Removing Duplicate Lines

Typical usage after sorting: sort unsort.txt | uniq Count occurrences: sort unsort.txt | uniq -c Show only duplicated lines: sort unsort.txt | uniq -d Specify comparison range with -s (start) and -w (width).

06 tr – Translating Characters

General form: echo 12345 | tr '0-9' '9876543210' Convert tabs to spaces: cat text | tr '\t' ' ' Delete characters: cat file | tr -d '0-9' Complement set (keep only digits): cat file | tr -c '0-9' Squeeze repeated characters (commonly spaces): cat file | tr -s ' ' Character classes (e.g., lower‑to‑upper):

tr '[:lower:]' '[:upper:]'

07 cut – Column Extraction

Extract columns 2 and 4: cut -f2,4 filename Remove column 3: cut -f3 --complement filename Specify delimiter: cut -f2 -d ";" filename Extract character ranges:

cut -c1-5 file   # first to fifth character
cut -c-2 file    # first two characters

08 paste – Merging Columns

Combine two files column‑wise (default tab delimiter): paste file1 file2 Use a custom delimiter, e.g., comma:

paste file1 file2 -d ","

09 wc – Counting

Line count: wc -l file Word count: wc -w file Byte/character count:

wc -c file

10 sed – Stream Editing

Replace first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups and back‑references: echo "this is an example" | sed 's/w+/[&]/g' Variable substitution (double quotes):

p=pattern
r=replace
echo "line with a pattern" | sed "s/$p/$r/g"

Insert characters, complement set, squeeze, and character‑class examples are also shown.

11 awk – Data‑Stream Processing

Typical script structure:

awk 'BEGIN{ statements } statements2 END{ statements }' file

Printing and field handling:

# Print header, each line, and footer
awk '{print "start"} {print} END{print "End"}'
# Print specific fields
awk '{print $2, $3}' file
# Print line number and fields
awk '{print NR":"$0"-"$1"-"$2}'
# Count lines
awk 'END{print NR}' file
# Sum first column
awk 'BEGIN{sum=0} {sum+=$1} END{print sum}'

Built‑in variables: NR – current record number (line number) NF – number of fields in the current line $0 – entire line $1, $2 … – individual fields

Filtering examples:

# Lines with line number < 5
awk 'NR < 5'
# Range of lines
awk 'NR==1,NR==4 {print}' file
# Lines containing "linux"
awk '/linux/'
# Lines NOT containing "linux"
awk '!/linux/'

Set field separator: awk -F: '{print $NF}' /etc/passwd Read command output:

awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'

Pass external shell variables:

var=1000
awk '{print vara}' vara=$var

Loops and control structures:

# For loop
awk 'BEGIN{for(i=0;i<10;i++) print i}'
# Loop over array
awk 'BEGIN{array[1]="a"; array[2]="b"; for(i in array) print array[i]}'
# While reading lines
awk '{while (getline line) print line}' < file.txt
# Implementing tac (reverse output)
seq 9 | awk '{lifo[NR]=$0; lno=NR} END{for(;lno>-1;lno--) print lifo[lno]}'
# head and tail equivalents
awk 'NR<=10' file   # head
awk '{buffer[NR%10]=$0; lno=NR} END{for(i=0;i<10;i++) print buffer[i]}' file   # tail

Formatting output with printf: seq 10 | awk '{printf "->%4s ", $1}' These commands together form a powerful toolbox for processing and analyzing text data directly from the Linux command line.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxShelltext processingGrepawkfindsed
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.