Master Linux Text Processing: Essential Shell Tools and Practical Examples
This article provides a comprehensive guide to the most commonly used Linux shell utilities for text manipulation—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—offering clear explanations, typical parameters, and real‑world command examples to help you handle files efficiently.
This guide introduces the essential Linux command‑line tools for processing text, presenting the most useful options and practical examples for each utility.
find – File Search
Search for txt and pdf files: find . \( -name "*.txt" -o -name "*.pdf" \) -print Regex search for .txt or .pdf: find . -regex ".*\(\.txt|\.pdf\)$" Case‑insensitive regex: find . -iregex ".*\.txt$" Exclude txt files: find . ! -name "*.txt" -print Limit depth to current directory (depth 1): find . -maxdepth 1 -type f Search by type (directories only): find . -type d -print Search by modification time (last 7 days): find . -atime 7 -type f -print Search by size (>2k): find . -type f -size +2k Search by permission (e.g., 644): find . -type f -perm 644 -print Search by owner: find . -type f -user weber -print Delete all .swp files in the current directory: find . -type f -name "*.swp" -delete Execute a command on each match (change ownership): find . -type f -user root -exec chown weber {} \; Copy files older than 10 days to another directory: find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \; Run a custom script on each match: find . -type f -name "*.log" -exec ./process.sh {} \; Use \0 as delimiter for filenames containing spaces:
find . -print0grep – Text Search
Count matching lines: grep -c "text" filename Show line numbers: grep -n "pattern" file Case‑insensitive search: grep -i "pattern" file Print only file names with matches: grep -l "pattern" * Recursive search in directories: grep "class" . -R -n Match multiple patterns: grep -e "class" -e "virtual" file Delete files whose names contain a pattern (using \0 as delimiter):
grep "test" file* -lZ | xargs -0 rmxargs – Argument Builder
Convert multiline output to a single line: cat file.txt | xargs Convert a single line into multiple lines (3 arguments per line): cat single.txt | xargs -n 3 Specify a custom delimiter (default space, newline is \n): xargs -d "," command Replace placeholder {} in a command: cat file.txt | xargs -I {} ./command.sh -p {} -1 Use \0 as input delimiter (useful with find -print0):
find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -lsort – Sorting
Numeric reverse sort on column 1: sort -nrk 1 data.txt Ignore leading blanks:
sort -bd datauniq – Remove Duplicate Lines
Delete duplicate lines: sort unsort.txt | uniq Count occurrences of each line: sort unsort.txt | uniq -c Show only duplicated lines: sort unsort.txt | uniq -d Compare specific fields (e.g., start at column 2, compare 5 characters):
uniq -s 2 -w 5 filetr – Translate / Delete Characters
Simple character substitution (e.g., ROT‑10): echo 12345 | tr '0-9' '9876543210' Convert tabs to spaces: cat text | tr '\t' ' ' Delete all digits: cat file | tr -d '0-9' Keep only digits (complement): cat file | tr -c '0-9' Compress repeated spaces: cat file | tr -s ' ' Character classes (e.g., lower‑to‑upper):
tr '[:lower:]' '[:upper:]'cut – Column Extraction
Extract fields 2 and 4: cut -f2,4 filename Exclude field 3: cut -f3 --complement filename Specify delimiter (semicolon): cat -f2 -d ";" filename Byte‑wise extraction (first 5 bytes): cut -c1-5 file First two characters:
cut -c-2 filepaste – Merge Columns
Combine two files side‑by‑side (default tab delimiter): paste file1 file2 Use a custom delimiter (comma):
paste file1 file2 -d ","wc – Count Lines, Words, Bytes
Line count: wc -l file Word count: wc -w file Byte/character count:
wc -c filesed – Stream Editor
Replace first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups: sed 's/hello\([0-9]\)/\1/' Insert a character after the first three characters:
sed 's/^.{3}/&\//g' fileawk – Data‑Stream Processing
Basic script structure: awk 'BEGIN{...} { ... } END{...}' file Print current line: awk '{print}' file Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum values in the first column: awk '{sum+=$1} END{print sum}' file Pass external variables: var=1000; awk '{print $1}' var=$var file Filter by line number or pattern:
awk 'NR<5' file awk '/linux/' fileSet field delimiter (e.g., colon): awk -F: '{print $NF}' /etc/passwd Read command output inside awk: awk '{ "grep root /etc/passwd" | getline cmd; print cmd }' Implement head/tail:
awk 'NR<=10{print}' file awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' filePrint a range of lines (4‑6): awk 'NR==4,NR==6' file Print between two patterns: awk '/start_pattern/,/end_pattern/' file Common built‑in functions: index, sub, match, length.
The article is a concise reference derived from the book “Linux Shell Script Guide”, offering ready‑to‑use command snippets for everyday text‑processing tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
