Master Linux Text Processing: Find, Grep, Sed, Awk and More
This guide provides a comprehensive overview of essential Linux command‑line tools for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—complete with practical examples, common options, and tips for chaining commands to efficiently search, transform, and analyze files.
find – File Search
The find utility searches the filesystem hierarchy for files matching criteria such as name patterns, regular expressions, depth, type, timestamps, size, permissions, and ownership.
find . -name "*.txt" -o -name "*.pdf" -print find . -regex ".*\.(txt|pdf)$" find . ! -name "*.txt" -print find . -maxdepth 1 -type f find . -type d -print find . -type f -atime 7 -print find . -type f -size +2k find . -type f -perm 644 -print find . -type f -user weber -printCommon post‑search actions use -exec:
find . -type f -name "*.swp" -delete find . -type f -user root -exec chown weber {} \; find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD/ \;grep – Text Search
grepscans files for lines matching a pattern. Frequently used options: -o output only the matching part; -v invert match. -c count matching lines. -n show line numbers. -i ignore case. -l list matching file names.
grep "text" filename grep -c "text" filename grep -R "class" -n . grep -e "class" -e "virtual" file grep "test" * -lZ | xargs -0 rmxargs – Argument Builder
xargsconverts standard‑input into arguments for another command, enabling pipelines such as: cat file.txt | xargs Key options: -d specify a custom delimiter (default is whitespace). -n limit the number of arguments per command invocation. -I {} replace a placeholder with the input item, useful for complex commands.
cat file.txt | xargs -I {} ./command.sh {} find . -name "*.cpp" -print0 | xargs -0 wc -lsort – Ordering Output
sortorders lines. Important flags: -n numeric sort; -d dictionary order. -r reverse order. -k N sort by the N‑th field.
sort -nrk 1 data.txt sort -bd data.txt # ignore leading blanksuniq – Removing Duplicates
Typically used after sort to filter duplicate lines or count occurrences.
sort unsort.txt | uniq sort unsort.txt | uniq -c # count each unique line sort unsort.txt | uniq -d # show only duplicated linestr – Translating Characters
trreplaces or deletes characters, compresses repeats, and works with character classes.
echo 12345 | tr '0-9' '9876543210' cat file | tr -d '0-9' # delete digits cat file | tr -s ' ' # squeeze multiple spaces tr '[:lower:]' '[:upper:]' # lower‑to‑upper casecut – Column Extraction
Extract fields or characters from each line.
cut -f2,4 filename # fields 2 and 4 cut -f3 --complement filename # all but field 3 cut -c1-5 file # characters 1‑5Field delimiter can be changed with -d:
cut -d ":" -f6 /etc/passwdpaste – Merging Columns
Join files side‑by‑side. Default delimiter is a tab; a custom delimiter can be set with -d.
paste file1 file2 paste -d "," file1 file2wc – Counting Lines, Words, Characters
wc -l file # lines wc -w file # words wc -c file # bytessed – Stream Editing
Perform in‑place or streamed text transformations.
sed 's/text/replace_text/' file # first occurrence per line sed 's/text/replace_text/g' file # global replacement sed -i 's/text/replace_text/g' file # edit file in place sed '/^$/d' file # delete empty lines sed 's/^.{3}/&/' file # insert text after first 3 charsawk – Data‑Stream Processing
Awk programs consist of optional BEGIN, a main block, and optional END sections.
awk 'BEGIN{print "start"} {print} END{print "end"}' fileKey built‑in variables: NR – current record number (line number). NF – number of fields in the current record. $0 – entire line; $1, $2, … – individual fields.
Common examples:
awk '{print NR":"$0"-"$1"-"$2}' file awk 'END{print NR}' file # total lines awk '{sum+=$1} END{print sum}' file # sum of first column awk -v var=1000 '{print var}' fileRow filtering:
awk 'NR<5' file # first 4 lines awk 'NR==1,NR==4 {print}' file # lines 1‑4 awk '/linux/' file # lines containing "linux" awk '!/linux/' file # lines not containing "linux"Set field separator: awk -F ':' '{print $NF}' /etc/passwd Read command output with getline:
awk '{"grep root /etc/passwd" | getline cmd; print cmd}'Loops inside awk:
for (i=0;i<10;i++) print i for (i in array) print array[i]Implementations of common utilities:
# head (first 10 lines)
awk 'NR<=10{print}' file
# tail (last 10 lines)
awk '{buf[NR%10]=$0} END{for(i=1;i<=10;i++) print buf[(NR+i)%10]}' fileIterating Over Lines, Words, and Characters
Shell loop to read a file line‑by‑line: while read line; do echo $line; done < file.txt Awk can iterate over words in a line: awk '{for (i=1;i<=NF;i++) print $i}' file Bash substring expansion for character‑wise iteration:
word="example"
for ((i=0;i<${#word};i++)); do echo ${word:i:1}; doneSource: 大CC, www.cnblogs.com/me15/p/3427319.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
