Master Linux Text Processing: Essential Shell Commands and Practical Examples
This guide introduces the most commonly used Linux shell tools for text processing—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—providing concise explanations, useful options, and real‑world command examples to help you handle files efficiently.
find – File Search
The find command locates files based on name patterns, depth, type, time, size, permissions, and ownership. Common examples:
Search for *.txt and *.pdf files:
find . \( -name "*.txt" -o -name "*.pdf" \) -printRegex search for the same extensions: find . -regex ".*\(\.txt|\.pdf\)$" Find non‑ .txt files: find . ! -name "*.txt" -print Limit search depth to the current directory: find . -maxdepth 1 -type f Search by access, modification, or change time (e.g., files accessed in the last 7 days): find . -atime 7 -type f -print Find files larger than 2 KB: find . -type f -size +2k Find files with specific permissions: find . -type f -perm 644 -print Find files owned by a user: find . -type f -user weber -print Delete all .swp files: find . -type f -name "*.swp" -delete Execute a command on each match (e.g., change ownership):
find . -type f -user root -exec chown weber {} \;Copy files older than 10 days to another directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;Run a custom script on each match: find . -type f -exec ./commands.sh {} \; Use -print0 to separate filenames with a null character, enabling safe handling of spaces.
grep – Text Search
grepsearches for patterns in files. Frequently used options: -o – output only the matching part. -v – invert match (show non‑matching lines). -c – count matching lines. -n – show line numbers. -i – ignore case. -l – list matching file names.
Recursive search in a directory tree: grep "class" . -R -n Search for multiple patterns: <code>grep -e "class" -e "virtual" file</code> Delete files whose names contain null‑terminated strings: <code>grep "test" file* -lZ | xargs -0 rm</code> xargs – Convert Input to Command‑Line Arguments xargs builds command lines from standard input, allowing combination with other tools such as grep or find . Convert multi‑line output to a single line: <code>cat file.txt | xargs</code> Convert a single line into multiple lines (e.g., three arguments per line): <code>cat single.txt | xargs -n 3</code> Define a custom delimiter (default is space, newline can be used with -d ). Replace placeholder {} in the executed command: <code>cat file.txt | xargs -I {} ./command.sh -p {} -1</code> Use null character as delimiter for safe handling of spaces: <code>find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l</code> sort – Sorting Key options: -n – numeric sort; -d – dictionary order. -r – reverse order. -k N – sort by the N‑th column. -b – ignore leading blanks. Examples: <code>sort -nrk 1 data.txt</code> <code>sort -bd data</code> uniq – Remove Duplicate Lines Typical uses: Eliminate duplicate lines: <code>sort unsort.txt | uniq</code> Count occurrences of each line: <code>sort unsort.txt | uniq -c</code> Show only duplicated lines: <code>sort unsort.txt | uniq -d</code> Options -s (skip fields) and -w (compare only a certain number of characters) allow fine‑grained control. tr – Translate or Delete Characters Common patterns: Simple character substitution (e.g., ROT13‑style mapping): <code>echo 12345 | tr '0-9' '9876543210'</code> Convert tabs to spaces: <code>cat text | tr '\t' ' '</code> Delete all digits: <code>cat file | tr -d '0-9'</code> Complement set (keep only digits): <code>cat file | tr -c '0-9'</code> Compress repeated characters (useful for collapsing multiple spaces): <code>cat file | tr -s ' '</code> Character classes such as [:lower:] , [:upper:] , [:digit:] , etc., can be used for broader transformations: <code>tr '[:lower:]' '[:upper:]'</code> cut – Column Extraction Extract specific fields from a delimited file: Columns 2 and 4: <code>cut -f2,4 filename</code> All columns except 3: <code>cut -f3 --complement filename</code> Specify a delimiter (default is TAB): <code>cut -d ";" -f2 filename</code> Byte‑wise or character‑wise extraction: <code>cut -c1-5 file # first five characters
cut -c-2 file # first two characters</code> paste – Merge Files Column‑wise Combine files side by side. Default delimiter is a TAB; a custom delimiter can be set with -d : <code>paste file1 file2 -d ","</code> wc – Word, Line, and Byte Count Useful counters: Lines: wc -l file Words: wc -w file Bytes: wc -c file sed – Stream Editor for Text Substitution Key operations: Replace first occurrence on each line: <code>sed 's/text/replace_text/' file</code> Global replacement: <code>sed 's/text/replace_text/g' file</code> In‑place editing (modify the file directly): <code>sed -i 's/text/replace_text/g' file</code> Delete empty lines: <code>sed '/^$/d' file</code> Use captured groups with \1 , \2 , etc., for advanced substitutions. awk – Powerful Text‑Processing Language Structure of an awk program: <code>awk 'BEGIN { statements } statements END { statements }' file</code> Typical tasks: Print each line (default): <code>awk '{print}' file</code> Print specific fields: <code>awk '{print $2, $3}' file</code> Count lines: <code>awk 'END {print NR}' file</code> Sum a column: <code>awk '{sum+=$1} END {print sum}' file</code> Pass external variables: <code>var=1000
awk '{print $0}' vara=$var file</code> Filter by line number, pattern, or range: <code>awk 'NR<5' file # first 4 lines
awk '/linux/' file # lines containing "linux"
awk 'NR==4,NR==6' file # lines 4 through 6
awk '/start/,/end/' file # between patterns</code> Set field separator: <code>awk -F ':' '{print $NF}' /etc/passwd</code> Read command output into a variable: <code>awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'</code> Implement head and tail : <code>awk 'NR<=10{print}' file # head
awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' file # tail</code> Iterating Over Files, Words, and Characters Common Bash loops: Read a file line‑by‑line: <code>while read line; do echo $line; done < file.txt</code> Process each word in a line: <code>for word in $line; do echo $word; done</code> Iterate over characters using parameter expansion: <code>for ((i=0;i<${#word};i++)); do echo ${word:i:1}; done</code> This collection of commands forms a practical toolbox for everyday Linux text manipulation and automation tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
