Master Linux Text Processing: Find, Grep, Sed, Awk, and More
This guide provides a comprehensive overview of essential Linux command‑line utilities for text manipulation—including find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—detailing common options, practical examples, and advanced techniques for searching, filtering, transforming, and processing files efficiently.
This article introduces the most commonly used Linux shell tools for text processing, offering practical examples and parameters for each command.
01 find – File Searching
Basic file search by name and pattern:
find . ( -name "*.txt" -o -name "*.pdf" ) -printSearch using regular expressions (case‑insensitive with -iregex): find . -regex ".*(.txt|.pdf)$" Negate a pattern: find . ! -name "*.txt" -print Limit search depth (depth = 1): find . -maxdepth 1 -type f Custom searches:
By type: find . -type d -print # list directories only By time:
find . -atime 7 -type f -print # files accessed in the last 7 daysBy size: find . -type f -size +2k By permission: find . -type f -perm 644 -print By user: find . -type f -user weber -print Post‑search actions:
Delete files (e.g., all *.swp files): find . -type f -name "*.swp" -delete Execute a command on each match:
find . -type f -user root -exec chown weber {} \;Copy recent files:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;Delimiter control: default space, use -print0 for NUL‑delimited output.
02 grep – Text Searching
Basic usage: grep match_pattern file Common options: -o output only the matching part (vs -v for non‑matches) -c count matches -n show line numbers -i ignore case -l list matching file names
Recursive search (programmers' favorite): grep "class" . -R -n Search multiple patterns: grep -e "class" -e "virtual" file Use -z to treat NUL as line delimiter (useful for deleting files with spaces in names):
grep "test" file* -lZ | xargs -0 rm03 xargs – Argument Construction
xargsconverts input data into command‑line arguments, allowing powerful combinations with other commands such as grep and find.
Simple conversion: cat file.txt | xargs Specify delimiter: cat file.txt | xargs -d ";" Replace placeholder with -I {}:
cat file.txt | xargs -I {} ./command.sh -p {} -1Null‑delimited input: cat file.txt | xargs -0 Limit arguments per command:
cat file.txt | xargs -n 304 sort – Sorting
Key options: -n numeric sort, -d dictionary order -r reverse order -k N sort by the N‑th column
Example: sort -nrk 1 data.txt Ignore leading blanks:
sort -bd data05 uniq – Removing Duplicate Lines
Typical usage after sorting: sort unsort.txt | uniq Count occurrences: sort unsort.txt | uniq -c Show only duplicated lines: sort unsort.txt | uniq -d Specify comparison range with -s (start) and -w (width).
06 tr – Translating Characters
General form: echo 12345 | tr '0-9' '9876543210' Convert tabs to spaces: cat text | tr '\t' ' ' Delete characters: cat file | tr -d '0-9' Complement set (keep only digits): cat file | tr -c '0-9' Squeeze repeated characters (commonly spaces): cat file | tr -s ' ' Character classes (e.g., lower‑to‑upper):
tr '[:lower:]' '[:upper:]'07 cut – Column Extraction
Extract columns 2 and 4: cut -f2,4 filename Remove column 3: cut -f3 --complement filename Specify delimiter: cut -f2 -d ";" filename Extract character ranges:
cut -c1-5 file # first to fifth character
cut -c-2 file # first two characters08 paste – Merging Columns
Combine two files column‑wise (default tab delimiter): paste file1 file2 Use a custom delimiter, e.g., comma:
paste file1 file2 -d ","09 wc – Counting
Line count: wc -l file Word count: wc -w file Byte/character count:
wc -c file10 sed – Stream Editing
Replace first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups and back‑references: echo "this is an example" | sed 's/w+/[&]/g' Variable substitution (double quotes):
p=pattern
r=replace
echo "line with a pattern" | sed "s/$p/$r/g"Insert characters, complement set, squeeze, and character‑class examples are also shown.
11 awk – Data‑Stream Processing
Typical script structure:
awk 'BEGIN{ statements } statements2 END{ statements }' filePrinting and field handling:
# Print header, each line, and footer
awk '{print "start"} {print} END{print "End"}'
# Print specific fields
awk '{print $2, $3}' file
# Print line number and fields
awk '{print NR":"$0"-"$1"-"$2}'
# Count lines
awk 'END{print NR}' file
# Sum first column
awk 'BEGIN{sum=0} {sum+=$1} END{print sum}'Built‑in variables: NR – current record number (line number) NF – number of fields in the current line $0 – entire line $1, $2 … – individual fields
Filtering examples:
# Lines with line number < 5
awk 'NR < 5'
# Range of lines
awk 'NR==1,NR==4 {print}' file
# Lines containing "linux"
awk '/linux/'
# Lines NOT containing "linux"
awk '!/linux/'Set field separator: awk -F: '{print $NF}' /etc/passwd Read command output:
awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'Pass external shell variables:
var=1000
awk '{print vara}' vara=$varLoops and control structures:
# For loop
awk 'BEGIN{for(i=0;i<10;i++) print i}'
# Loop over array
awk 'BEGIN{array[1]="a"; array[2]="b"; for(i in array) print array[i]}'
# While reading lines
awk '{while (getline line) print line}' < file.txt
# Implementing tac (reverse output)
seq 9 | awk '{lifo[NR]=$0; lno=NR} END{for(;lno>-1;lno--) print lifo[lno]}'
# head and tail equivalents
awk 'NR<=10' file # head
awk '{buffer[NR%10]=$0; lno=NR} END{for(i=0;i<10;i++) print buffer[i]}' file # tailFormatting output with printf: seq 10 | awk '{printf "->%4s ", $1}' These commands together form a powerful toolbox for processing and analyzing text data directly from the Linux command line.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
