Master Linux Text Processing: Find, Grep, Sed, Awk and More
This guide provides a comprehensive overview of essential Linux shell tools for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—complete with practical command examples, common options, and tips for combining these utilities to solve real‑world file‑handling tasks.
Introduction
This article introduces the most frequently used Linux shell utilities for processing text files, such as find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk. The examples use the most common and practical parameters, and the scripts follow the principle of keeping commands on a single line whenever possible.
01 find – File Search
Basic usage examples: find . -name "*.txt" -o -name "*.pdf" -print Regular‑expression search (case‑insensitive): find . -regex ".*(\.txt|\.pdf)$" Negate a pattern: find . ! -name "*.txt" -print Limit search depth to 1: find . -maxdepth 1 -type f Search by type (directories only): find . -type d -print Search by modification time (last 7 days): find . -atime 7 -type f -print Search by size (greater than 2 KB): find . -type f -size +2k Search by permission (e.g., 644): find . -type f -perm 644 -print Search by owner: find . -type f -user weber -print Delete all *.swp files in the current directory: find . -type f -name "*.swp" -delete Execute a command on each matched file (change ownership to weber):
find . -type f -user root -exec chown weber {} \;Copy recent *.txt files (modified within the last 10 days) to another directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;Combine multiple commands by writing a script and invoking it with -exec: find . -type f -exec ./commands.sh {} \; Output delimiter control: default space, -print0 uses a null byte to safely handle filenames containing spaces.
02 grep – Text Search
Basic pattern matching: grep "pattern" file Common options: -o output only the matching part (vs. -v output non‑matching lines) -c count matches per file -n show line numbers -i ignore case -l list matching filenames only
Recursive search in a directory tree (the programmer’s favorite): grep "class" . -R -n Match multiple patterns: grep -e "class" -e "virtual" file Use -Z to output null‑terminated filenames and pipe to xargs -0 for safe bulk operations:
grep "test" * -lZ | xargs -0 rm03 sort – Sorting
Key options: -n numeric sort (vs. -d dictionary order) -r reverse order -k N sort by the N‑th field
Examples:
sort -nrk 1 data.txt sort -bd data.txt # ignore leading blanks04 uniq – Remove Duplicate Lines
Typical usage with sort: sort unsort.txt | uniq Count occurrences: sort unsort.txt | uniq -c Show only duplicate lines: sort unsort.txt | uniq -d Specify comparison range with -s (start) and -w (width).
05 tr – Translate Characters
General form: echo 12345 | tr '0-9' '9876543210' Convert tabs to spaces: cat text | tr '\t' ' ' Delete characters: cat file | tr -d '0-9' Complement set: cat file | tr -c '0-9' Compress repeated characters (e.g., squeeze spaces): cat file | tr -s ' ' Character classes (e.g., [:lower:] to [:upper:] conversion):
tr '[:lower:]' '[:upper:]'06 cut – Column Extraction
Extract fields 2 and 4 (default delimiter is TAB): cut -f2,4 filename Remove column 3: cut -f3 --complement filename Specify a custom delimiter: cut -f2 -d ";" filename Field ranges: N- from field N to the end -M first M fields N-M fields N through M
Units: -b bytes -c characters -f fields (default delimiter)
07 paste – Merge Columns
Combine two files column‑wise (default delimiter is TAB): paste file1 file2 Use a different delimiter, e.g., a comma:
paste -d "," file1 file208 wc – Count Lines, Words, Characters
Examples:
wc -l file # line count wc -w file # word count wc -c file # byte/character count09 sed – Stream Editor
Replace the first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups: sed 's/hello\([0-9]\)/\1/' Variable substitution with double quotes:
p="pattern"; r="replace"; echo "line with pattern" | sed "s/$p/$r/g"Insert characters (e.g., add a slash after the third character):
sed 's/^.{3}/&\//g' file10 awk – Data‑Stream Processing
Typical script structure: awk 'BEGIN{...} { ... } END{...}' file Common workflow:
Execute BEGIN block.
Read each line, execute the main statements.
Execute END block.
Printing examples:
echo -e "line1 line2" | awk 'BEGIN{print "start"} {print} END{print "End"}'Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum the first field:
echo -e "1 2 3 4" | awk 'BEGIN{sum=0} {sum+=$1} END{print sum}'Pass external variables:
var=1000; echo | awk -v vara=$var '{print vara}'Filter by line number or pattern:
awk 'NR<5' file awk 'NR==1,NR==4 {print}' file awk '/linux/' file awk '!/linux/' fileSet field delimiter: awk -F ':' '{print $NF}' /etc/passwd Read command output inside awk:
awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'Loop examples:
for(i=0;i<10;i++) print i for(i in array) print array[i]Implement tac (reverse output) in awk:
seq 9 | awk '{lifo[NR]=$0; lno=NR} END{for(;lno>-1;lno--) print lifo[lno]}'Implement head and tail:
awk 'NR<=10{print}' filename # head awk '{buffer[NR%10]=$0} END{for(i=0;i<10;i++) print buffer[i]}' filename # tail (last 10 lines)Iterate over lines, words, and characters using shell loops or awk constructs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
