Master Essential Linux Shell Text Tools: Find, Grep, Awk, Sed & More
This guide introduces the most frequently used Linux shell utilities for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—explaining their core options, practical examples, and how to combine them for powerful command‑line workflows.
Linux Shell is a fundamental skill; despite its quirky syntax and lower readability compared to languages like Python, mastering it reveals many aspects of the Linux system and remains essential for everyday scripting.
Key tools for text handling in Linux include find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk . The examples and parameters shown are the most common and practical. The author prefers one‑line commands and advises using Python for more complex tasks.
1. find – File Search
Search for
*.txtand
*.pdffiles:
<code>find . \( -name "*.txt" -o -name "*.pdf" \) -print</code>Regex search:
<code>find . -regex ".*\(\.txt|\.pdf\)$"</code>Case‑insensitive regex:
<code>find . -iregex ".*\.txt$"</code>Negate pattern (exclude txt files):
<code>find . ! -name "*.txt" -print</code>Limit search depth (depth = 1):
<code>find . -maxdepth 1 -type f</code>Search by type:
<code>find . -type d -print # directories only</code> <code>find . -type f -print # regular files</code>Search by time:
atime – access time (days)
mtime – modification time
ctime – metadata change time
<code>find . -atime 7 -type f -print # accessed in last 7 days</code>Search by size (e.g., larger than 2 KB):
<code>find . -type f -size +2k</code>Search by permission:
<code>find . -type f -perm 644 -print</code>Search by owner:
<code>find . -type f -user weber -print</code>Actions after finding:
Delete:
<code>find . -type f -name "*.swp" -delete</code>Execute (powerful
-exec):
<code>find . -type f -user root -exec chown weber {} \;</code>Note: {} is replaced by each matched file name.
<code>find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;</code>Combine multiple commands via a script:
<code>-exec ./commands.sh {} \;</code>Output delimiters:
Default:
'\n' -print0uses
'\0'to handle spaces in file names.
2. grep – Text Search
Basic usage:
<code>grep match_pattern file</code>Common options:
-ooutput only matching part,
-vinvert match
-ccount matches
-nshow line numbers
-iignore case
-llist matching file names
Recursive search (favorite for code):
<code>grep "class" . -R -n</code>Match multiple patterns:
<code>grep -e "class" -e "virtual" file</code>Use
-zto output file names terminated by
\0:
<code>grep "test" file* -lZ | xargs -0 rm</code>3. xargs – Build Command Lines
Convert input lines to arguments, useful with
grepor
find:
<code>cat file.txt | xargs</code>Convert multi‑line output to a single line:
<code>cat file.txt | xargs</code>Convert a single line to multiple lines (e.g., three arguments per line):
<code>cat single.txt | xargs -n 3</code>Key options:
-ddefine delimiter (default space,
\nfor lines)
-nnumber of arguments per command line
-I {}replace placeholder with each argument
-0use
\0as delimiter
<code>find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l</code>4. sort – Sorting
Options:
-nnumeric sort,
-ddictionary order
-rreverse
-k Nsort by column N
<code>sort -nrk 1 data.txt</code> <code>sort -bd data # ignore leading blanks</code>5. uniq – Remove Duplicate Lines
<code>sort unsort.txt | uniq</code>Count occurrences:
<code>sort unsort.txt | uniq -c</code>Show only duplicated lines:
<code>sort unsort.txt | uniq -d</code>Compare specific fields with
-s(start) and
-w(width).
6. tr – Translate/Replace Characters
<code>echo 12345 | tr '0-9' '9876543210' # simple cipher</code> <code>cat text | tr '\t' ' ' # tabs to spaces</code>Delete characters:
<code>cat file | tr -d '0-9'</code>Complement set (
-c):
<code>cat file | tr -c '0-9'</code> <code>cat file | tr -d -c '0-9 \n'</code>Compress repeated characters (
-s), often used to squeeze spaces:
<code>cat file | tr -s ' '</code>Character classes (e.g.,
[:lower:],
[:digit:],
[:space:]).
<code>tr '[:lower:]' '[:upper:]'</code>7. cut – Column Extraction
<code>cut -f2,4 filename</code> <code>cut -f3 --complement filename # all but column 3</code>Specify delimiter with
-d:
<code>cat -f2 -d ";" filename</code>Ranges:
N-from field N to end
-Mfirst M fields
N-Mfields N through M
Units:
-bbytes
-ccharacters
-ffields (delimiter‑based)
<code>cut -c1-5 file # first 5 characters</code> <code>cut -c-2 file # first 2 characters</code>8. paste – Merge Columns
<code>paste file1 file2</code>Change delimiter (default tab) with
-d:
<code>paste file1 file2 -d ","</code>9. wc – Count Lines, Words, Bytes
<code>wc -l file # lines</code> <code>wc -w file # words</code> <code>wc -c file # bytes</code>10. sed – Stream Editing
Replace first occurrence:
<code>sed 's/text/replace_text/' file</code>Global replace:
<code>sed 's/text/replace_text/g' file</code>Edit file in place:
<code>sed -i 's/text/replace_text/g' file</code>Delete empty lines:
<code>sed '/^$/d' file</code>Use variables and double quotes for evaluation:
<code>p=pattern; r=replace; echo "a line with $p" | sed "s/$p/$r/g"</code>11. awk – Data‑Stream Processing
Structure:
<code>awk 'BEGIN{...} { ... } END{...}' file</code>Built‑in variables:
NR(record number),
NF(field count),
$0(whole line),
$1,
$2, …
<code>awk '{print $2, $3}' file</code>Count lines:
<code>awk 'END{print NR}' file</code>Sum first column:
<code>awk '{sum+=$1} END{print sum}' file</code>Pass external variable:
<code>var=1000; awk -v v=$var '{print v}' file</code>Set field separator:
<code>awk -F: '{print $NF}' /etc/passwd</code>Read command output:
<code>awk '{"grep root /etc/passwd" | getline out; print out}'</code>Implement
headand
tail:
<code>awk 'NR<=10{print}' file # head</code> <code>awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' file # tail</code>Common functions:
index(),
sub(),
match(),
length(),
printf().
12. Iterating Lines, Words, and Characters
While‑read loop:
<code>while read line; do echo "$line"; done < file.txt</code>Awk alternative:
<code>cat file.txt | awk '{print}'</code>Iterate words in a line:
<code>for word in $line; do echo $word; done</code>Iterate characters using Bash substring syntax:
<code>for ((i=0;i<${#word};i++)); do echo ${word:i:1}; done</code>Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.