Master Essential Linux Shell Tools: find, grep, awk, and More
This guide presents a comprehensive overview of the most frequently used Linux shell utilities for text processing—such as find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—offering practical examples, key options, and best‑practice recommendations for efficient command‑line workflows.
Linux Shell is a fundamental skill; despite its quirky syntax and low readability, it is often replaced by scripts like Python. However, mastering it is essential because working with shell scripts reveals many aspects of the Linux system.
The most commonly used tools for text processing in Linux are: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk.
1. find – File Search
Search for txt and pdf files:
<code>find . \( -name "*.txt" -o -name "*.pdf" \) -print</code>Regex search for .txt and .pdf:
<code>find . -regex ".*\(\.txt|\.pdf\)$"</code>Case‑insensitive regex:
<code>find . -iregex ".*\.txt$"</code>Find all non‑txt files:
<code>find . ! -name "*.txt" -print</code>Limit search depth (depth 1):
<code>find . -maxdepth 1 -type f</code>Search by type (directories only):
<code>find . -type d -print</code>Search by time:
‑atime: access time (days)
‑mtime: modification time
‑ctime: change time (metadata)
Files accessed in the last 7 days:
<code>find . -atime 7 -type f -print</code>Search by size (greater than 2 kB):
<code>find . -type f -size +2k</code>Search by permission (e.g., 644):
<code>find . -type f -perm 644 -print</code>Search by user:
<code>find . -type f -user weber -print</code>Delete all *.swp files in the current directory:
<code>find . -type f -name "*.swp" -delete</code>Execute a command on each matched file (change ownership to user weber):
<code>find . -type f -user root -exec chown weber {} \;</code>Note: {} is a placeholder that is replaced by the current file name for each match.
Copy found files to another directory:
<code>find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;</code>Combine multiple commands by invoking a script with -exec:
<code>find . -type f -print -exec ./commands.sh {} \;</code>2. grep – Text Search
Basic usage prints matching lines:
<code>grep "pattern" file</code>Common options:
-o: output only the matching part
-v: invert match (output non‑matching lines)
-c: count matching lines
-n: show line numbers
-i: ignore case
-l: list only file names
Recursive search in multiple directories (favorite for code search):
<code>grep "class" . -R -n</code>Match multiple patterns:
<code>grep -e "class" -e "virtual" file</code>Use null‑terminated output (‑z) for safe piping:
<code>grep "test" file* -lZ | xargs -0 rm</code>3. xargs – Build Command Lines from Input
xargs converts input data into command‑line arguments, allowing combination with other commands such as grep or find.
Convert multiline output to a single line:
<code>cat file.txt | xargs</code>Convert a single line to multiple lines (‑n specifies fields per line):
<code>cat single.txt | xargs -n 3</code>Key options:
-d: define delimiter (default space, newline is \n)
-n: specify number of arguments per command line
-I {}: replace {} with the input item
-0: use null character as delimiter
Example – run a script for each line:
<code>cat file.txt | xargs -I {} ./command.sh -p {} -1</code>Example – count lines of C++ source files:
<code>find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l</code>4. sort – Sorting
Options:
-n: numeric sort (vs. -d dictionary order)
-r: reverse order
-k N: sort by the N‑th column
Examples:
<code>sort -nrk 1 data.txt</code> <code>sort -bd data # ignore leading blanks</code>5. uniq – Remove Duplicate Lines
Basic usage:
<code>sort unsort.txt | uniq</code>Count occurrences:
<code>sort unsort.txt | uniq -c</code>Show only duplicate lines:
<code>sort unsort.txt | uniq -d</code>Specify fields to compare (‑s start, ‑w width):
<code>sort unsort.txt | uniq -f 2 -s 5 -w 10</code>6. tr – Translate or Delete Characters
General usage:
<code>echo 12345 | tr '0-9' '9876543210' # simple substitution</code> <code>cat text | tr '\t' ' ' # tab to space</code>Delete characters:
<code>cat file | tr -d '0-9' # remove all digits</code>Complement set (‑c):
<code>cat file | tr -c '0-9' # keep only digits</code> <code>cat file | tr -d -c '0-9 \n' # delete non‑digits</code>Compress repeated characters (‑s):
<code>cat file | tr -s ' '</code>Character classes (e.g., [:lower:], [:upper:]):
<code>tr '[:lower:]' '[:upper:]'</code>7. cut – Extract Columns
Extract columns 2 and 4:
<code>cut -f2,4 filename</code>Exclude column 3:
<code>cut -f3 --complement filename</code>Specify delimiter:
<code>cut -d ";" -f2 filename</code>Field ranges:
N‑: from field N to end
M‑N: fields M through N
Units:
-b: bytes
-c: characters
-f: fields (delimiter‑based)
Examples:
<code>cut -c1-5 file # first five characters</code> <code>cut -c-2 file # first two characters</code>8. paste – Merge Files Linewise
Combine two files column‑wise (default delimiter is tab):
<code>paste file1 file2</code>Specify a different delimiter (e.g., comma):
<code>paste file1 file2 -d ","</code>9. wc – Word, Line, and Byte Count
Count lines:
<code>wc -l file</code>Count words:
<code>wc -w file</code>Count bytes:
<code>wc -c file</code>10. sed – Stream Editor for Text Substitution
Replace first occurrence on each line:
<code>sed 's/text/replace_text/' file</code>Global replacement:
<code>sed 's/text/replace_text/g' file</code>Edit file in place (‑i):
<code>sed -i 's/text/replace_text/g' file</code>Delete empty lines:
<code>sed '/^$/d' file</code>Use captured groups:
<code>sed 's/hello\([0-9]\)/\1/'</code>Variable substitution with double quotes:
<code>p=pattern; r=replace; echo "a line with pattern" | sed "s/$p/$r/g"</code>11. awk – Powerful Text Processing Language
Basic script structure:
<code>awk 'BEGIN{print "start"} {print} END{print "end"}' file</code>Key built‑in variables:
NR – record number (line number)
NF – number of fields
$0 – entire line
$1, $2 … – individual fields
Print specific fields:
<code>awk '{print $2, $3}' file</code>Count lines:
<code>awk 'END{print NR}' file</code>Sum first column:
<code>awk '{sum+=$1} END{print sum}' file</code>Filter by line number:
<code>awk 'NR<5' file</code>Filter by pattern:
<code>awk '/linux/' file</code>Set field delimiter (‑F):
<code>awk -F: '{print $NF}' /etc/passwd</code>Read command output with getline:
<code>echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'</code>Implement head (first 10 lines):
<code>awk 'NR<=10{print}' filename</code>Implement tail (last 10 lines):
<code>awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' filename</code>Source: 大CC, http://www.cnblogs.com/me115/p/3427319.html (originally from the public account “民工哥技术之路”).
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.