Fundamentals 16 min read

Master Linux Text Processing: Find, Grep, Sed, Awk and More

This guide provides a comprehensive overview of essential Linux shell tools for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—complete with practical command examples, common options, and tips for combining these utilities to solve real‑world file‑handling tasks.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Master Linux Text Processing: Find, Grep, Sed, Awk and More

Introduction

This article introduces the most frequently used Linux shell utilities for processing text files, such as find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk. The examples use the most common and practical parameters, and the scripts follow the principle of keeping commands on a single line whenever possible.

01 find – File Search

Basic usage examples: find . -name "*.txt" -o -name "*.pdf" -print Regular‑expression search (case‑insensitive): find . -regex ".*(\.txt|\.pdf)$" Negate a pattern: find . ! -name "*.txt" -print Limit search depth to 1: find . -maxdepth 1 -type f Search by type (directories only): find . -type d -print Search by modification time (last 7 days): find . -atime 7 -type f -print Search by size (greater than 2 KB): find . -type f -size +2k Search by permission (e.g., 644): find . -type f -perm 644 -print Search by owner: find . -type f -user weber -print Delete all *.swp files in the current directory: find . -type f -name "*.swp" -delete Execute a command on each matched file (change ownership to weber):

find . -type f -user root -exec chown weber {} \;

Copy recent *.txt files (modified within the last 10 days) to another directory:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Combine multiple commands by writing a script and invoking it with -exec: find . -type f -exec ./commands.sh {} \; Output delimiter control: default space, -print0 uses a null byte to safely handle filenames containing spaces.

02 grep – Text Search

Basic pattern matching: grep "pattern" file Common options: -o output only the matching part (vs. -v output non‑matching lines) -c count matches per file -n show line numbers -i ignore case -l list matching filenames only

Recursive search in a directory tree (the programmer’s favorite): grep "class" . -R -n Match multiple patterns: grep -e "class" -e "virtual" file Use -Z to output null‑terminated filenames and pipe to xargs -0 for safe bulk operations:

grep "test" * -lZ | xargs -0 rm

03 sort – Sorting

Key options: -n numeric sort (vs. -d dictionary order) -r reverse order -k N sort by the N‑th field

Examples:

sort -nrk 1 data.txt
sort -bd data.txt   # ignore leading blanks

04 uniq – Remove Duplicate Lines

Typical usage with sort: sort unsort.txt | uniq Count occurrences: sort unsort.txt | uniq -c Show only duplicate lines: sort unsort.txt | uniq -d Specify comparison range with -s (start) and -w (width).

05 tr – Translate Characters

General form: echo 12345 | tr '0-9' '9876543210' Convert tabs to spaces: cat text | tr '\t' ' ' Delete characters: cat file | tr -d '0-9' Complement set: cat file | tr -c '0-9' Compress repeated characters (e.g., squeeze spaces): cat file | tr -s ' ' Character classes (e.g., [:lower:] to [:upper:] conversion):

tr '[:lower:]' '[:upper:]'

06 cut – Column Extraction

Extract fields 2 and 4 (default delimiter is TAB): cut -f2,4 filename Remove column 3: cut -f3 --complement filename Specify a custom delimiter: cut -f2 -d ";" filename Field ranges: N- from field N to the end -M first M fields N-M fields N through M

Units: -b bytes -c characters -f fields (default delimiter)

07 paste – Merge Columns

Combine two files column‑wise (default delimiter is TAB): paste file1 file2 Use a different delimiter, e.g., a comma:

paste -d "," file1 file2

08 wc – Count Lines, Words, Characters

Examples:

wc -l file   # line count
wc -w file   # word count
wc -c file   # byte/character count

09 sed – Stream Editor

Replace the first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups: sed 's/hello\([0-9]\)/\1/' Variable substitution with double quotes:

p="pattern"; r="replace"; echo "line with pattern" | sed "s/$p/$r/g"

Insert characters (e.g., add a slash after the third character):

sed 's/^.{3}/&\//g' file

10 awk – Data‑Stream Processing

Typical script structure: awk 'BEGIN{...} { ... } END{...}' file Common workflow:

Execute BEGIN block.

Read each line, execute the main statements.

Execute END block.

Printing examples:

echo -e "line1 line2" | awk 'BEGIN{print "start"} {print} END{print "End"}'

Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum the first field:

echo -e "1 2 3 4" | awk 'BEGIN{sum=0} {sum+=$1} END{print sum}'

Pass external variables:

var=1000; echo | awk -v vara=$var '{print vara}'

Filter by line number or pattern:

awk 'NR<5' file
awk 'NR==1,NR==4 {print}' file
awk '/linux/' file
awk '!/linux/' file

Set field delimiter: awk -F ':' '{print $NF}' /etc/passwd Read command output inside awk:

awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'

Loop examples:

for(i=0;i<10;i++) print i
for(i in array) print array[i]

Implement tac (reverse output) in awk:

seq 9 | awk '{lifo[NR]=$0; lno=NR} END{for(;lno>-1;lno--) print lifo[lno]}'

Implement head and tail:

awk 'NR<=10{print}' filename   # head
awk '{buffer[NR%10]=$0} END{for(i=0;i<10;i++) print buffer[i]}' filename   # tail (last 10 lines)

Iterate over lines, words, and characters using shell loops or awk constructs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

text processingGrepsed
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.