Fundamentals 20 min read

Master Linux Text Processing: Essential Shell Commands Explained

This article introduces the most commonly used Linux shell utilities for text processing—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—providing practical examples, key options, and tips for efficient one‑line scripting while suggesting Python for more complex tasks.

MaGe Linux Operations

May 26, 2016

Master Linux Text Processing: Essential Shell Commands Explained

find – File Search

grep – Text Search

xargs – Command Line Argument Conversion

sort – Sorting

uniq – Removing Duplicate Lines

tr – Translating and Transforming

cut – Column Cutting

paste – Column Pasting

wc – Counting Lines and Characters

sed – Text Replacement

awk – Data Stream Processing

Iterating Over Lines, Words, and Characters

This article presents the most frequently used Linux shell tools for processing text: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk. The examples and options shown are the most practical, and the author recommends keeping scripts to a single line (no more than two lines) and using Python for more complex tasks.

find – File Search

Search for txt and pdf files

find . \( -name "*.txt" -o -name "*.pdf" \) -print

Regex search for .txt and .pdf find . -regex ".*$\.txt|\.pdf$$" -iregex: case‑insensitive regex

Negate pattern

Find all non‑txt files find . ! -name "*.txt" -print Specify search depth (depth 1 prints files in current directory)

find . -maxdepth 1 -type f

Custom Search

Search by type find . -type d -print // list directories only -type f for files, -type l for symbolic links

Search by time

Files accessed in the last 7 days: find . -atime 7 -type f -print Search by size (w, k, M, G). Find files larger than 2k: find . -type f -size +2k Search by permission:

find . -type f -perm 644 -print // find files with permission 644

Search by user:

find . -type f -user weber -print // files owned by user weber

Post‑Search Actions

Delete all *.swp files in current directory: find . -type f -name "*.swp" -delete Execute actions with -exec

find . -type f -user root -exec chown weber {} \; // change ownership to weber

Note: {} is replaced by each matched file name.

Copy all matched files to another directory:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Combine multiple commands (use a script and call it with -exec)

-exec ./commands.sh {} \;

-print delimiter

Default delimiter is '\n'. Use -print0 for '\0' to handle filenames containing spaces.

grep – Text Search

grep match_pattern file // default prints matching lines

Common options

-o prints only the matching part, vs -v prints non‑matching lines; -c counts matches. grep -c "text" filename -n prints line numbers; -i ignores case; -l prints only file names.

Recursive search in multiple directories (favorite for searching code): grep "class" . -R -n Match multiple patterns grep -e "class" -e "virtual" file Print file names terminated by \0 (use -z)

grep "test" file* -lZ | xargs -0 rm

xargs – Command Line Argument Conversion

xargs converts input data into command‑line arguments, allowing combination with many commands such as grep and find.

Convert multi‑line output to a single line: cat file.txt | xargs (\n is the line delimiter)

Convert a single line to multiple lines: cat single.txt | xargs -n 3 -n specifies the number of fields per line

xargs options

-d defines delimiter (default space, \n for multi‑line); -n specifies output lines; -I {} defines a replace string for commands needing multiple arguments.

Example:

cat file.txt | xargs -I {} ./command.sh -p {} -1

-0 sets \0 as input delimiter.

Count lines of code:

find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l

sort – Sorting

Options:

-n numeric sort vs -d dictionary order

-r reverse order

-k N sort by column N

Examples:

sort -nrk 1 data.txt
sort -bd data // ignore leading blanks

uniq – Removing Duplicate Lines

Remove duplicate lines sort unsort.txt | uniq Count occurrences of each line sort unsort.txt | uniq -c Show only duplicate lines sort unsort.txt | uniq -d Specify comparison range with -s (start) and -w (width).

tr – Translating and Transforming

General usage

echo 12345 | tr '0-9' '9876543210' // simple substitution cipher
cat text | tr '\t' ' ' // convert tabs to spaces

Delete characters

cat file | tr -d '0-9' // delete all digits

cat file | tr -c '0-9' // complement: keep only digits

Compress repeated characters

tr -s compresses repeated characters, often used to squeeze spaces. cat file | tr -s ' ' Character classes (e.g., alnum, alpha, digit, space, lower, upper, cntrl, print)

tr '[:lower:]' '[:upper:]'

cut – Column Cutting

Extract columns 2 and 4: cut -f2,4 filename Exclude column 3: cut -f3 --complement filename Specify delimiter with -d: cat -f2 -d ";" filename Field ranges: N- (from N to end), -M (first M fields), N-M (from N to M).

Units: -b bytes, -c characters, -f fields (with delimiter)

cut -c1-5 file // first 5 characters
cut -c-2 file // first 2 characters

paste – Column Pasting

Combine two files column‑wise:

cat file1
1
2

cat file2
colin
book

paste file1 file2
1 colin
2 book

Default delimiter is a tab; use -d to specify another delimiter:

paste file1 file2 -d ","
1,colin
2,book

wc – Counting Lines and Characters

Examples:

wc -l file // line count
wc -w file // word count
wc -c file // byte count

sed – Text Replacement

Replace first occurrence in each line sed 's/text/replace_text/' file Global replacement sed 's/text/replace_text/g' file Use -i to edit files in place. sed -i 's/text/replace_text/g' file Delete empty lines sed '/^$/d' file Variable substitution using & and \1

echo this is an example | sed 's/\w\+/[&]/g'

Use double quotes for variable evaluation

p=pattern
r=replace
echo "line contains pattern" | sed "s/$p/$r/g"
# => line contains replace

awk – Data Stream Processing

Script structure

awk 'BEGIN{ statements } statements END{ statements }'

Workflow: BEGIN block runs once, then statements process each input line, finally END block runs.

Printing

print without arguments prints the current line

echo -e "line1
line2" | awk 'BEGIN{print "start"} {print} END{print "end"}'

print with commas separates fields with spaces

awk '{var1="v1"; var2="V2"; var3="v3"; print var1, var2, var3}'
# v1 V2 v3
awk '{print var1"-"var2"-"var3}'
# v1-V2-v3

Special Variables

NR – record number (line number); NF – number of fields; $0 – entire line; $1, $2 – first and second fields.

echo -e "line1 f2 f3
line2" | awk '{print NR":"$0"-"$1"-"$2}'

Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum first field values:

awk 'BEGIN{sum=0} {sum+=$1} END{print sum}' file

Passing External Variables

var=1000
awk '{print var}' var=$var
awk '{print var}' var=$var file

Filtering with Patterns

awk 'NR<5'            # lines with number < 5
awk 'NR==1,NR==4'    # lines 1 through 4
awk '/linux/'         # lines containing "linux"
awk '!/linux/'        # lines not containing "linux"
awk -F: '{print $NF}' /etc/passwd

Reading Command Output

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'

Loops in awk

for(i=0;i<10;i++){print i}
for(i in array){print array[i]}

Implementing head and tail

head (first 10 lines): awk 'NR<=10{print}' filename tail (last 10 lines):

awk '{buffer[NR%10]=$0} END{for(i=0;i<10;i++) print buffer[i]}' filename

Printing Specific Columns

awk example: ls -lrt | awk '{print $6}' cut example:

ls -lrt | cut -f6

Printing a Text Range

By line numbers: seq 100 | awk 'NR==4,NR==6{print}' By patterns (from start_pattern to end_pattern):

awk '/start_pattern/,/end_pattern/' filename

Common awk Built‑in Functions

index(string,search) – position of search in string

sub(regex,repl,string) – replace first match

match(regex,string) – test regex match

length(string) – string length awk '{print length($0)}' printf formats output similar to C's printf.

seq 10 | awk '{printf "-%>%4s
", $1}'

Iterating Over Lines, Words, and Characters

Iterate over each line

While‑loop method

while read line; do
  echo $line
done < file.txt

Subshell version:

cat file.txt | (while read line; do echo $line; done)

awk method cat file.txt | awk '{print}' 2. Iterate over each word in a line: for word in $line; do echo $word; done 3. Iterate over each character (bash string slicing):

for ((i=0;i<${#word};i++))
do
  echo ${word:i:1}
done

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux shell command-line text processing Unix tools

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Table of Contents

find – File Search

Custom Search

Post‑Search Actions

-print delimiter

grep – Text Search

xargs – Command Line Argument Conversion

xargs options

sort – Sorting

uniq – Removing Duplicate Lines

tr – Translating and Transforming

cut – Column Cutting

paste – Column Pasting

wc – Counting Lines and Characters

sed – Text Replacement

awk – Data Stream Processing

Printing

Special Variables

Passing External Variables

Filtering with Patterns

Reading Command Output

Loops in awk

Implementing head and tail

Printing Specific Columns

Printing a Text Range

Common awk Built‑in Functions

Iterating Over Lines, Words, and Characters

MaGe Linux Operations

How this landed with the community

Was this worth your time?

0 Comments