Master Essential Linux Shell Tools for Text Processing
This guide introduces the most frequently used Linux shell utilities—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—explaining their core options and providing practical command‑line examples to help readers efficiently manipulate and analyze text files.
Linux shell is a fundamental skill; despite its quirky syntax and low readability, it remains essential for understanding many aspects of the Linux system.
While advanced scripting may be done with Python, mastering simple shell commands for common tasks is still valuable.
Commonly used text‑processing tools in Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk.
1. find – File Search
Search for txt and pdf files:
find . \( -name '*.txt' -o -name '*.pdf' \) -printRegex search: find . -regex '.*\(\.txt|\.pdf\)$' -iregex : case‑insensitive regex
Find all non‑txt files: find . ! -name '*.txt' -print Limit search depth (depth = 1): find . -maxdepth 1 -type f Search by type: find . -type d -print # list directories only Search by time:
-atime N : accessed N days ago
-mtime N : modified N days ago
-ctime N : metadata changed N days ago
Files accessed in the last 7 days: find . -atime 7 -type f -print Search by size (e.g., larger than 2 k): find . -type f -size +2k Search by permission (e.g., 644): find . -type f -perm 644 -print Search by owner: find . -type f -user weber -print Delete all *.swp files: find . -type f -name '*.swp' -delete Execute a command on each match (example: change ownership):
find . -type f -user root -exec chown weber {} \;{} is replaced by the current file name for each match.
Copy matched files to another directory:
find . -type f -mtime +10 -name '*.txt' -exec cp {} OLD \;Combine multiple commands by invoking a script: -exec ./commands.sh {} \; Output delimiters:
Default \n (newline). Use -print0 for \0 to safely handle spaces.
2. grep – Text Search
Basic usage: grep -c 'text' filename -o: output only matching parts
-v: invert match
-c: count matches
-n: show line numbers
-i: case‑insensitive
-l: list matching file names
Recursive search in sub‑directories: grep 'class' . -R -n Match multiple patterns: grep -e 'class' -e 'virtual' file Find files whose names end with \0 (using -Z) and delete them:
grep 'test' file* -lZ | xargs -0 rm3. xargs – Build Command Lines from Input
Convert multiline output to a single line: cat file.txt | xargs Convert a single line to multiple lines (e.g., three arguments per line): cat single.txt | xargs -n 3 Key options:
-d delimiter : define input delimiter (default space, newline is \n)
-n max : number of arguments per command invocation
-I {}: replace {} in the command with each input item
-0: use \0 as delimiter
Example – count lines of C++ source files:
find source_dir/ -type f -name '*.cpp' -print0 | xargs -0 wc -l4. sort – Sorting
-n: numeric sort
-d: dictionary order
-r: reverse
-k N: sort by column N
Examples:
sort -nrk 1 data.txt sort -bd data # ignore leading blanks5. uniq – Remove Duplicate Lines
Basic usage (requires sorted input): sort unsort.txt | uniq Count occurrences: sort unsort.txt | uniq -c Show only duplicated lines: sort unsort.txt | uniq -d Compare specific fields with -s (start) and -w (width).
6. tr – Translate or Delete Characters
General conversion:
echo 12345 | tr '0-9' '9876543210' cat text | tr '\t' ' 'Delete characters: cat file | tr -d '0-9' Complement set:
cat file | tr -c '0-9' cat file | tr -d -c '0-9
'Compress repeated characters (commonly spaces): cat file | tr -s ' ' Character classes (e.g., [:lower:] → [:upper:]):
tr '[:lower:]' '[:upper:]'7. cut – Extract Columns
Extract fields 2 and 4: cut -f2,4 filename Exclude field 3: cut -f3 --complement filename Specify delimiter: cut -d ';' -f2 filename Range specifications:
N‑ : from field N to end
M‑N : fields M through N
Units:
-b: bytes
-c: characters
-f: fields (delimiter‑based)
cut -c1-5 file # first five characters cut -c-2 file # first two characters8. paste – Merge Files Line‑wise
Basic merge (default delimiter is a tab): paste file1 file2 Specify delimiter (e.g., comma):
paste file1 file2 -d ','9. wc – Count Lines, Words, Bytes
wc -l file # lines wc -w file # words wc -c file # bytes10. sed – Stream Editor for Text Substitution
Replace first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups:
sed 's/hello\([0-9]\)/\1/'11. awk – Powerful Text‑Processing Language
Basic script structure: awk 'BEGIN{...} { ... } END{...}' file Common built‑in variables:
NR – record (line) number
NF – number of fields
$0 – entire line
$1, $2 … – individual fields
Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum first column: awk '{sum+=$1} END{print sum}' file Filter by pattern: awk '/linux/' file # lines containing "linux" Set field delimiter: awk -F ':' '{print $NF}' /etc/passwd Read command output into a variable:
awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'Loop examples:
for(i=0;i<10;i++){print i} for(i in array){print array[i]}Implement head/tail with awk:
awk 'NR<=10{print}' filename # head awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' filename # tail12. Iterating Over Files
Line‑by‑line iteration using while:
while read line; do echo "$line"; done < file.txtUsing awk for the same purpose: cat file.txt | awk '{print}' Iterate over words in a line: for word in $line; do echo $word; done Iterate over characters (bash substring syntax):
for ((i=0;i<${#word};i++)); do echo ${word:i:1}; doneSource: 大CC – http://www.cnblogs.com/me115/p/3427319.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
