Master Essential Linux Shell Text Processing Tools: find, grep, awk, and More
This article provides a comprehensive guide to the most frequently used Linux shell text‑processing utilities—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—offering practical examples, command‑line options, and tips for efficient one‑ or two‑line scripts.
Linux Shell is a fundamental skill; although its syntax can be quirky and readability poor, it is often replaced by scripts such as Python. Because it is a basic competency, mastering it is important, as learning Shell scripts also reveals many aspects of the Linux system.
Becoming a Linux scripting master is not easy for everyone, but using simple Shell commands to achieve common basic functions is still necessary.
Below is an introduction to the most commonly used tools for text processing in Linux: find , grep , xargs , sort , uniq , tr , cut , paste , wc , sed , awk . The examples and parameters shown are the most practical; the principle is to write commands in a single line, preferably not exceeding two lines. For more complex tasks, consider Python.
1. find – File Search
Search for txt and pdf files:
find . ( -name "*.txt" -o -name "*.pdf" ) -printSearch using regular expressions for .txt and .pdf: find . -regex ".*(.txt|.pdf)$" Use -iregex to ignore case.
Negate pattern – find all non‑txt files: find . ! -name "*.txt" -print Specify search depth – list files in the current directory (depth 1):
find . -maxdepth 1 -type fCustom Search
Search by type (list only directories): find . -type d -print Search by time: -atime – access time (days; use -amin for minutes) -mtime – modification time -ctime – change time (metadata or permission changes)
Files accessed in the last 7 days: find . -atime 7 -type f -print Search by size (k, M, G). Find files larger than 2 kB: find . -type f -size +2k Search by permission (e.g., find files with permission 644): find . -type f -perm 644 -print Search by user:
find . -type f -user weber -printPost‑Search Actions
Delete all *.swp files in the current directory: find . -type f -name "*.swp" -delete Execute a command on each match (powerful -exec):
find . -type f -user root -exec chown weber {} \;Note: {} is replaced by the current file name. Example – copy found files to another directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;Combine multiple commands by writing a script and invoking it with -exec:
-exec ./commands.sh {} \;Print Delimiters
By default -print uses a newline as the delimiter. -print0 uses a null character, allowing handling of filenames containing spaces.
2. grep – Text Search
Basic usage: grep -c "text" filename Common options: -o – output only the matching part -v – output lines that do NOT match -c – count matching lines -n – print line numbers -i – ignore case -l – print only file names
Recursive search in multi‑level directories (a programmer’s favorite): grep "class" . -R -n Match multiple patterns: grep -e "class" -e "virtual" file Use -Z to output file names with a null terminator, then delete them with xargs -0:
grep "test" file* -lZ | xargs -0 rm3. xargs – Convert Input to Command‑Line Arguments
xargstransforms input data into arguments for other commands, useful with grep, find, etc.
Convert multi‑line output to a single line: cat file.txt | xargs Convert a single line to multiple lines (e.g., three arguments per line):
cat single.txt | xargs -n 3xargs Options
-d– define delimiter (default is space; newline is
) -n – specify number of arguments per command line -I {} – replace placeholder with the input item -0 – input delimiter is null character
Example – count lines in C source files:
find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l4. sort – Sorting
Key options: -n – numeric sort (vs. -d dictionary order) -r – reverse order -k N – sort by the N‑th column
Example: sort -nrk 1 data.txt Ignore leading blanks:
sort -bd data5. uniq – Remove Duplicate Lines
Remove duplicate lines: sort unsort.txt | uniq Count occurrences of each line: sort unsort.txt | uniq -c Show only duplicated lines: sort unsort.txt | uniq -d Specify start position and width with -s and -w.
6. tr – Translate or Delete Characters
General usage:
echo 12345 | tr '0-9' '9876543210' # simple substitution cat text | tr '\t' ' 'Delete characters: cat file | tr -d '0-9' # delete all digits Complement set ( -c) to keep only matching characters:
cat file | tr -c '0-9' # keep only digits cat file | tr -d -c '0-9' # delete non‑digitsCompress repeated characters (useful for collapsing spaces): cat file | tr -s ' ' Character classes (e.g., [:lower:], [:upper:], [:digit:], etc.) can be used as:
tr '[:lower:]' '[:upper:]'7. cut – Extract Columns
Extract the 2nd and 4th columns: cut -f2,4 filename Exclude the 3rd column: cut -f3 --complement filename Specify delimiter (e.g., semicolon): cut -f2 -d ";" filename Range specifications: N- – from field N to the end -M – from the first field to M N-M – fields N through M
Units: -b – bytes -c – characters -f – fields (using the delimiter)
Example – print first five characters: cut -c1-5 file Example – print first two characters:
cut -c-2 file8. paste – Merge Files Column‑wise
Combine two files column‑wise: paste file1 file2 Default delimiter is a tab; you can set a custom delimiter with -d:
paste file1 file2 -d ","9. wc – Word, Line, and Byte Count
Count lines: wc -l file Count words: wc -w file Count bytes:
wc -c file10. sed – Stream Editor for Text Substitution
Replace the first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use & to reference the matched string: echo "this is an example" | sed 's/w+/[&]/g' Capture groups with parentheses and reference them: sed 's/hello\([0-9]\)//' Double‑quoted expressions allow variable expansion:
p=pattern; r=replace; echo "line contains pattern" | sed "s/$p/$r/g"11. awk – Data‑Stream Processing Tool
Basic script structure:
awk 'BEGIN{ statements } statements END{ statements }' filePrint the current line: awk '{print}' file Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum the first column:
awk 'BEGIN{sum=0} {sum+=$1} END{print sum}' filePass external variables: var=1000; awk '{print $0}' var=$var file Filter by line number or pattern:
awk 'NR<5' file awk '/linux/' fileSet field delimiter with -F: awk -F: '{print $NF}' /etc/passwd Read command output with getline:
awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'Loop constructs:
for(i=0;i<10;i++){print i} for(i in array){print array[i]}Implement head and tail:
awk 'NR<=10{print}' filename # head awk '{buffer[NR%10]=$0} END{for(i=0;i<10;i++) print buffer[i]}' filename # tailPrint specific columns using awk or cut:
ls -lrt | awk '{print $6}' ls -lrt | cut -f612. Iterating Over Lines, Words, and Characters
Iterate Over Each Line
while read line; do echo $line; done < file.txt cat file.txt | while read line; do echo $line; done cat file.txt | awk '{print}'Iterate Over Each Word in a Line
for word in $line; do echo $word; doneIterate Over Each Character
for ((i=0;i<${#word};i++)); do echo ${word:i:1}; doneSource: 大CC Link: http://www.cnblogs.com/me115/p/3427319.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
