Fundamentals 15 min read

Master Linux Text Processing: Find, Grep, Sed, Awk and More

This guide provides a comprehensive overview of essential Linux command‑line tools for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—complete with practical examples, common options, and tips for chaining commands to efficiently search, transform, and analyze files.

Liangxu Linux

Apr 16, 2019

Master Linux Text Processing: Find, Grep, Sed, Awk and More

find – File Search

The find utility searches the filesystem hierarchy for files matching criteria such as name patterns, regular expressions, depth, type, timestamps, size, permissions, and ownership.

find . -name "*.txt" -o -name "*.pdf" -print

find . -regex ".*\.(txt|pdf)$"

find . ! -name "*.txt" -print

find . -maxdepth 1 -type f

find . -type d -print

find . -type f -atime 7 -print

find . -type f -size +2k

find . -type f -perm 644 -print

find . -type f -user weber -print

Common post‑search actions use -exec:

find . -type f -name "*.swp" -delete

find . -type f -user root -exec chown weber {} \;

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD/ \;

grep – Text Search

grep

scans files for lines matching a pattern. Frequently used options: -o output only the matching part; -v invert match. -c count matching lines. -n show line numbers. -i ignore case. -l list matching file names.

grep "text" filename

grep -c "text" filename

grep -R "class" -n .

grep -e "class" -e "virtual" file

grep "test" * -lZ | xargs -0 rm

xargs – Argument Builder

xargs

converts standard‑input into arguments for another command, enabling pipelines such as: cat file.txt | xargs Key options: -d specify a custom delimiter (default is whitespace). -n limit the number of arguments per command invocation. -I {} replace a placeholder with the input item, useful for complex commands.

cat file.txt | xargs -I {} ./command.sh {}

find . -name "*.cpp" -print0 | xargs -0 wc -l

sort – Ordering Output

sort

orders lines. Important flags: -n numeric sort; -d dictionary order. -r reverse order. -k N sort by the N‑th field.

sort -nrk 1 data.txt

sort -bd data.txt   # ignore leading blanks

uniq – Removing Duplicates

Typically used after sort to filter duplicate lines or count occurrences.

sort unsort.txt | uniq

sort unsort.txt | uniq -c   # count each unique line

sort unsort.txt | uniq -d   # show only duplicated lines

tr – Translating Characters

tr

replaces or deletes characters, compresses repeats, and works with character classes.

echo 12345 | tr '0-9' '9876543210'

cat file | tr -d '0-9'          # delete digits

cat file | tr -s ' '           # squeeze multiple spaces

tr '[:lower:]' '[:upper:]'    # lower‑to‑upper case

cut – Column Extraction

Extract fields or characters from each line.

cut -f2,4 filename               # fields 2 and 4

cut -f3 --complement filename    # all but field 3

cut -c1-5 file                   # characters 1‑5

Field delimiter can be changed with -d:

cut -d ":" -f6 /etc/passwd

paste – Merging Columns

Join files side‑by‑side. Default delimiter is a tab; a custom delimiter can be set with -d.

paste file1 file2

paste -d "," file1 file2

wc – Counting Lines, Words, Characters

wc -l file   # lines

wc -w file   # words

wc -c file   # bytes

sed – Stream Editing

Perform in‑place or streamed text transformations.

sed 's/text/replace_text/' file               # first occurrence per line

sed 's/text/replace_text/g' file              # global replacement

sed -i 's/text/replace_text/g' file            # edit file in place

sed '/^$/d' file                              # delete empty lines

sed 's/^.{3}/&/' file                         # insert text after first 3 chars

awk – Data‑Stream Processing

Awk programs consist of optional BEGIN, a main block, and optional END sections.

awk 'BEGIN{print "start"} {print} END{print "end"}' file

Key built‑in variables: NR – current record number (line number). NF – number of fields in the current record. $0 – entire line; $1, $2, … – individual fields.

Common examples:

awk '{print NR":"$0"-"$1"-"$2}' file

awk 'END{print NR}' file                     # total lines

awk '{sum+=$1} END{print sum}' file          # sum of first column

awk -v var=1000 '{print var}' file

Row filtering:

awk 'NR<5' file                              # first 4 lines

awk 'NR==1,NR==4 {print}' file               # lines 1‑4

awk '/linux/' file                           # lines containing "linux"

awk '!/linux/' file                          # lines not containing "linux"

Set field separator: awk -F ':' '{print $NF}' /etc/passwd Read command output with getline:

awk '{"grep root /etc/passwd" | getline cmd; print cmd}'

Loops inside awk:

for (i=0;i<10;i++) print i

for (i in array) print array[i]

Implementations of common utilities:

# head (first 10 lines)
awk 'NR<=10{print}' file

# tail (last 10 lines)
awk '{buf[NR%10]=$0} END{for(i=1;i<=10;i++) print buf[(NR+i)%10]}' file

Iterating Over Lines, Words, and Characters

Shell loop to read a file line‑by‑line: while read line; do echo $line; done < file.txt Awk can iterate over words in a line: awk '{for (i=1;i<=NF;i++) print $i}' file Bash substring expansion for character‑wise iteration:

word="example"
for ((i=0;i<${#word};i++)); do echo ${word:i:1}; done

Source: 大CC, www.cnblogs.com/me15/p/3427319.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

text processing grep sed

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.