Master Linux Text Processing: Essential Shell Commands Explained
This article introduces the most commonly used Linux shell utilities for text processing—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—providing practical examples, key options, and tips for efficient one‑line scripting while suggesting Python for more complex tasks.
Table of Contents
find – File Search
grep – Text Search
xargs – Command Line Argument Conversion
sort – Sorting
uniq – Removing Duplicate Lines
tr – Translating and Transforming
cut – Column Cutting
paste – Column Pasting
wc – Counting Lines and Characters
sed – Text Replacement
awk – Data Stream Processing
Iterating Over Lines, Words, and Characters
This article presents the most frequently used Linux shell tools for processing text: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk. The examples and options shown are the most practical, and the author recommends keeping scripts to a single line (no more than two lines) and using Python for more complex tasks.
find – File Search
Search for txt and pdf files
find . \( -name "*.txt" -o -name "*.pdf" \) -printRegex search for .txt and .pdf find . -regex ".*\(\.txt|\.pdf\)$" -iregex: case‑insensitive regex
Negate pattern
Find all non‑txt files find . ! -name "*.txt" -print Specify search depth (depth 1 prints files in current directory)
find . -maxdepth 1 -type fCustom Search
Search by type find . -type d -print // list directories only -type f for files, -type l for symbolic links
Search by time
Files accessed in the last 7 days: find . -atime 7 -type f -print Search by size (w, k, M, G). Find files larger than 2k: find . -type f -size +2k Search by permission:
find . -type f -perm 644 -print // find files with permission 644Search by user:
find . -type f -user weber -print // files owned by user weberPost‑Search Actions
Delete all *.swp files in current directory: find . -type f -name "*.swp" -delete Execute actions with -exec
find . -type f -user root -exec chown weber {} \; // change ownership to weberNote: {} is replaced by each matched file name.
Copy all matched files to another directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;Combine multiple commands (use a script and call it with -exec)
-exec ./commands.sh {} \;-print delimiter
Default delimiter is '\n'. Use -print0 for '\0' to handle filenames containing spaces.
grep – Text Search
grep match_pattern file // default prints matching lines
Common options
-o prints only the matching part, vs -v prints non‑matching lines; -c counts matches. grep -c "text" filename -n prints line numbers; -i ignores case; -l prints only file names.
Recursive search in multiple directories (favorite for searching code): grep "class" . -R -n Match multiple patterns grep -e "class" -e "virtual" file Print file names terminated by \0 (use -z)
grep "test" file* -lZ | xargs -0 rmxargs – Command Line Argument Conversion
xargs converts input data into command‑line arguments, allowing combination with many commands such as grep and find.
Convert multi‑line output to a single line: cat file.txt | xargs (\n is the line delimiter)
Convert a single line to multiple lines: cat single.txt | xargs -n 3 -n specifies the number of fields per line
xargs options
-d defines delimiter (default space, \n for multi‑line); -n specifies output lines; -I {} defines a replace string for commands needing multiple arguments.
Example:
cat file.txt | xargs -I {} ./command.sh -p {} -1-0 sets \0 as input delimiter.
Count lines of code:
find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -lsort – Sorting
Options:
-n numeric sort vs -d dictionary order
-r reverse order
-k N sort by column N
Examples:
sort -nrk 1 data.txt
sort -bd data // ignore leading blanksuniq – Removing Duplicate Lines
Remove duplicate lines sort unsort.txt | uniq Count occurrences of each line sort unsort.txt | uniq -c Show only duplicate lines sort unsort.txt | uniq -d Specify comparison range with -s (start) and -w (width).
tr – Translating and Transforming
General usage
echo 12345 | tr '0-9' '9876543210' // simple substitution cipher
cat text | tr '\t' ' ' // convert tabs to spacesDelete characters
cat file | tr -d '0-9' // delete all digits cat file | tr -c '0-9' // complement: keep only digitsCompress repeated characters
tr -s compresses repeated characters, often used to squeeze spaces. cat file | tr -s ' ' Character classes (e.g., alnum, alpha, digit, space, lower, upper, cntrl, print)
tr '[:lower:]' '[:upper:]'cut – Column Cutting
Extract columns 2 and 4: cut -f2,4 filename Exclude column 3: cut -f3 --complement filename Specify delimiter with -d: cat -f2 -d ";" filename Field ranges: N- (from N to end), -M (first M fields), N-M (from N to M).
Units: -b bytes, -c characters, -f fields (with delimiter)
cut -c1-5 file // first 5 characters
cut -c-2 file // first 2 characterspaste – Column Pasting
Combine two files column‑wise:
cat file1
1
2
cat file2
colin
book
paste file1 file2
1 colin
2 bookDefault delimiter is a tab; use -d to specify another delimiter:
paste file1 file2 -d ","
1,colin
2,bookwc – Counting Lines and Characters
Examples:
wc -l file // line count
wc -w file // word count
wc -c file // byte countsed – Text Replacement
Replace first occurrence in each line sed 's/text/replace_text/' file Global replacement sed 's/text/replace_text/g' file Use -i to edit files in place. sed -i 's/text/replace_text/g' file Delete empty lines sed '/^$/d' file Variable substitution using & and \1
echo this is an example | sed 's/\w\+/[&]/g'Use double quotes for variable evaluation
p=pattern
r=replace
echo "line contains pattern" | sed "s/$p/$r/g"
# => line contains replaceawk – Data Stream Processing
Script structure
awk 'BEGIN{ statements } statements END{ statements }'Workflow: BEGIN block runs once, then statements process each input line, finally END block runs.
Printing
print without arguments prints the current line
echo -e "line1
line2" | awk 'BEGIN{print "start"} {print} END{print "end"}'print with commas separates fields with spaces
awk '{var1="v1"; var2="V2"; var3="v3"; print var1, var2, var3}'
# v1 V2 v3
awk '{print var1"-"var2"-"var3}'
# v1-V2-v3Special Variables
NR – record number (line number); NF – number of fields; $0 – entire line; $1, $2 – first and second fields.
echo -e "line1 f2 f3
line2" | awk '{print NR":"$0"-"$1"-"$2}'Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum first field values:
awk 'BEGIN{sum=0} {sum+=$1} END{print sum}' filePassing External Variables
var=1000
awk '{print var}' var=$var
awk '{print var}' var=$var fileFiltering with Patterns
awk 'NR<5' # lines with number < 5
awk 'NR==1,NR==4' # lines 1 through 4
awk '/linux/' # lines containing "linux"
awk '!/linux/' # lines not containing "linux"
awk -F: '{print $NF}' /etc/passwdReading Command Output
echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'Loops in awk
for(i=0;i<10;i++){print i}
for(i in array){print array[i]}Implementing head and tail
head (first 10 lines): awk 'NR<=10{print}' filename tail (last 10 lines):
awk '{buffer[NR%10]=$0} END{for(i=0;i<10;i++) print buffer[i]}' filenamePrinting Specific Columns
awk example: ls -lrt | awk '{print $6}' cut example:
ls -lrt | cut -f6Printing a Text Range
By line numbers: seq 100 | awk 'NR==4,NR==6{print}' By patterns (from start_pattern to end_pattern):
awk '/start_pattern/,/end_pattern/' filenameCommon awk Built‑in Functions
index(string,search) – position of search in string
sub(regex,repl,string) – replace first match
match(regex,string) – test regex match
length(string) – string length awk '{print length($0)}' printf formats output similar to C's printf.
seq 10 | awk '{printf "-%>%4s
", $1}'Iterating Over Lines, Words, and Characters
Iterate over each line
While‑loop method
while read line; do
echo $line
done < file.txtSubshell version:
cat file.txt | (while read line; do echo $line; done)awk method cat file.txt | awk '{print}' 2. Iterate over each word in a line: for word in $line; do echo $word; done 3. Iterate over each character (bash string slicing):
for ((i=0;i<${#word};i++))
do
echo ${word:i:1}
doneSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
