Fundamentals 15 min read

Master Linux Text Processing: Find, Grep, Sed, Awk and More

This guide provides a comprehensive overview of essential Linux command‑line tools for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—complete with practical examples, common options, and tips for chaining commands to efficiently search, transform, and analyze files.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Master Linux Text Processing: Find, Grep, Sed, Awk and More

find – File Search

The find utility searches the filesystem hierarchy for files matching criteria such as name patterns, regular expressions, depth, type, timestamps, size, permissions, and ownership.

find . -name "*.txt" -o -name "*.pdf" -print
find . -regex ".*\.(txt|pdf)$"
find . ! -name "*.txt" -print
find . -maxdepth 1 -type f
find . -type d -print
find . -type f -atime 7 -print
find . -type f -size +2k
find . -type f -perm 644 -print
find . -type f -user weber -print

Common post‑search actions use -exec:

find . -type f -name "*.swp" -delete
find . -type f -user root -exec chown weber {} \;
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD/ \;

grep – Text Search

grep

scans files for lines matching a pattern. Frequently used options: -o output only the matching part; -v invert match. -c count matching lines. -n show line numbers. -i ignore case. -l list matching file names.

grep "text" filename
grep -c "text" filename
grep -R "class" -n .
grep -e "class" -e "virtual" file
grep "test" * -lZ | xargs -0 rm

xargs – Argument Builder

xargs

converts standard‑input into arguments for another command, enabling pipelines such as: cat file.txt | xargs Key options: -d specify a custom delimiter (default is whitespace). -n limit the number of arguments per command invocation. -I {} replace a placeholder with the input item, useful for complex commands.

cat file.txt | xargs -I {} ./command.sh {}
find . -name "*.cpp" -print0 | xargs -0 wc -l

sort – Ordering Output

sort

orders lines. Important flags: -n numeric sort; -d dictionary order. -r reverse order. -k N sort by the N‑th field.

sort -nrk 1 data.txt
sort -bd data.txt   # ignore leading blanks

uniq – Removing Duplicates

Typically used after sort to filter duplicate lines or count occurrences.

sort unsort.txt | uniq
sort unsort.txt | uniq -c   # count each unique line
sort unsort.txt | uniq -d   # show only duplicated lines

tr – Translating Characters

tr

replaces or deletes characters, compresses repeats, and works with character classes.

echo 12345 | tr '0-9' '9876543210'
cat file | tr -d '0-9'          # delete digits
cat file | tr -s ' '           # squeeze multiple spaces
tr '[:lower:]' '[:upper:]'    # lower‑to‑upper case

cut – Column Extraction

Extract fields or characters from each line.

cut -f2,4 filename               # fields 2 and 4
cut -f3 --complement filename    # all but field 3
cut -c1-5 file                   # characters 1‑5

Field delimiter can be changed with -d:

cut -d ":" -f6 /etc/passwd

paste – Merging Columns

Join files side‑by‑side. Default delimiter is a tab; a custom delimiter can be set with -d.

paste file1 file2
paste -d "," file1 file2

wc – Counting Lines, Words, Characters

wc -l file   # lines
wc -w file   # words
wc -c file   # bytes

sed – Stream Editing

Perform in‑place or streamed text transformations.

sed 's/text/replace_text/' file               # first occurrence per line
sed 's/text/replace_text/g' file              # global replacement
sed -i 's/text/replace_text/g' file            # edit file in place
sed '/^$/d' file                              # delete empty lines
sed 's/^.{3}/&/' file                         # insert text after first 3 chars

awk – Data‑Stream Processing

Awk programs consist of optional BEGIN, a main block, and optional END sections.

awk 'BEGIN{print "start"} {print} END{print "end"}' file

Key built‑in variables: NR – current record number (line number). NF – number of fields in the current record. $0 – entire line; $1, $2, … – individual fields.

Common examples:

awk '{print NR":"$0"-"$1"-"$2}' file
awk 'END{print NR}' file                     # total lines
awk '{sum+=$1} END{print sum}' file          # sum of first column
awk -v var=1000 '{print var}' file

Row filtering:

awk 'NR<5' file                              # first 4 lines
awk 'NR==1,NR==4 {print}' file               # lines 1‑4
awk '/linux/' file                           # lines containing "linux"
awk '!/linux/' file                          # lines not containing "linux"

Set field separator: awk -F ':' '{print $NF}' /etc/passwd Read command output with getline:

awk '{"grep root /etc/passwd" | getline cmd; print cmd}'

Loops inside awk:

for (i=0;i<10;i++) print i
for (i in array) print array[i]

Implementations of common utilities:

# head (first 10 lines)
awk 'NR<=10{print}' file

# tail (last 10 lines)
awk '{buf[NR%10]=$0} END{for(i=1;i<=10;i++) print buf[(NR+i)%10]}' file

Iterating Over Lines, Words, and Characters

Shell loop to read a file line‑by‑line: while read line; do echo $line; done < file.txt Awk can iterate over words in a line: awk '{for (i=1;i<=NF;i++) print $i}' file Bash substring expansion for character‑wise iteration:

word="example"
for ((i=0;i<${#word};i++)); do echo ${word:i:1}; done
Source: 大CC, www.cnblogs.com/me15/p/3427319.html
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

text processingGrepsed
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.