Fundamentals 16 min read

Master Essential Linux Shell Tools for Text Processing

This guide introduces the most frequently used Linux shell utilities—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—explaining their core options and providing practical command‑line examples to help readers efficiently manipulate and analyze text files.

Open Source Linux

Aug 7, 2024

Master Essential Linux Shell Tools for Text Processing

Linux shell is a fundamental skill; despite its quirky syntax and low readability, it remains essential for understanding many aspects of the Linux system.

While advanced scripting may be done with Python, mastering simple shell commands for common tasks is still valuable.

Commonly used text‑processing tools in Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk.

1. find – File Search

Search for txt and pdf files:

find . \( -name '*.txt' -o -name '*.pdf' \) -print

Regex search: find . -regex '.*$\.txt|\.pdf$$' -iregex : case‑insensitive regex

Find all non‑txt files: find . ! -name '*.txt' -print Limit search depth (depth = 1): find . -maxdepth 1 -type f Search by type: find . -type d -print # list directories only Search by time:

-atime N : accessed N days ago

-mtime N : modified N days ago

-ctime N : metadata changed N days ago

Files accessed in the last 7 days: find . -atime 7 -type f -print Search by size (e.g., larger than 2 k): find . -type f -size +2k Search by permission (e.g., 644): find . -type f -perm 644 -print Search by owner: find . -type f -user weber -print Delete all *.swp files: find . -type f -name '*.swp' -delete Execute a command on each match (example: change ownership):

find . -type f -user root -exec chown weber {} \;

{} is replaced by the current file name for each match.

Copy matched files to another directory:

find . -type f -mtime +10 -name '*.txt' -exec cp {} OLD \;

Combine multiple commands by invoking a script: -exec ./commands.sh {} \; Output delimiters:

Default \n (newline). Use -print0 for \0 to safely handle spaces.

2. grep – Text Search

Basic usage: grep -c 'text' filename -o: output only matching parts

-v: invert match

-c: count matches

-n: show line numbers

-i: case‑insensitive

-l: list matching file names

Recursive search in sub‑directories: grep 'class' . -R -n Match multiple patterns: grep -e 'class' -e 'virtual' file Find files whose names end with \0 (using -Z) and delete them:

grep 'test' file* -lZ | xargs -0 rm

3. xargs – Build Command Lines from Input

Convert multiline output to a single line: cat file.txt | xargs Convert a single line to multiple lines (e.g., three arguments per line): cat single.txt | xargs -n 3 Key options:

-d delimiter : define input delimiter (default space, newline is \n)

-n max : number of arguments per command invocation

-I {}: replace {} in the command with each input item

-0: use \0 as delimiter

Example – count lines of C++ source files:

find source_dir/ -type f -name '*.cpp' -print0 | xargs -0 wc -l

4. sort – Sorting

-n: numeric sort

-d: dictionary order

-r: reverse

-k N: sort by column N

Examples:

sort -nrk 1 data.txt

sort -bd data   # ignore leading blanks

5. uniq – Remove Duplicate Lines

Basic usage (requires sorted input): sort unsort.txt | uniq Count occurrences: sort unsort.txt | uniq -c Show only duplicated lines: sort unsort.txt | uniq -d Compare specific fields with -s (start) and -w (width).

6. tr – Translate or Delete Characters

General conversion:

echo 12345 | tr '0-9' '9876543210'

cat text | tr '\t' ' '

Delete characters: cat file | tr -d '0-9' Complement set:

cat file | tr -c '0-9'

cat file | tr -d -c '0-9 
'

Compress repeated characters (commonly spaces): cat file | tr -s ' ' Character classes (e.g., [:lower:] → [:upper:]):

tr '[:lower:]' '[:upper:]'

7. cut – Extract Columns

Extract fields 2 and 4: cut -f2,4 filename Exclude field 3: cut -f3 --complement filename Specify delimiter: cut -d ';' -f2 filename Range specifications:

N‑ : from field N to end

M‑N : fields M through N

Units:

-b: bytes

-c: characters

-f: fields (delimiter‑based)

cut -c1-5 file   # first five characters

cut -c-2 file   # first two characters

8. paste – Merge Files Line‑wise

Basic merge (default delimiter is a tab): paste file1 file2 Specify delimiter (e.g., comma):

paste file1 file2 -d ','

9. wc – Count Lines, Words, Bytes

wc -l file   # lines

wc -w file   # words

wc -c file   # bytes

10. sed – Stream Editor for Text Substitution

Replace first occurrence on each line: sed 's/text/replace_text/' file Global replacement: sed 's/text/replace_text/g' file Edit file in place: sed -i 's/text/replace_text/g' file Delete empty lines: sed '/^$/d' file Use captured groups:

sed 's/hello\([0-9]\)/\1/'

11. awk – Powerful Text‑Processing Language

Basic script structure: awk 'BEGIN{...} { ... } END{...}' file Common built‑in variables:

NR – record (line) number

NF – number of fields

$0 – entire line

$1, $2 … – individual fields

Print specific fields: awk '{print $2, $3}' file Count lines: awk 'END{print NR}' file Sum first column: awk '{sum+=$1} END{print sum}' file Filter by pattern: awk '/linux/' file # lines containing "linux" Set field delimiter: awk -F ':' '{print $NF}' /etc/passwd Read command output into a variable:

awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'

Loop examples:

for(i=0;i<10;i++){print i}

for(i in array){print array[i]}

Implement head/tail with awk:

awk 'NR<=10{print}' filename   # head

awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' filename   # tail

12. Iterating Over Files

Line‑by‑line iteration using while:

while read line; do echo "$line"; done < file.txt

Using awk for the same purpose: cat file.txt | awk '{print}' Iterate over words in a line: for word in $line; do echo $word; done Iterate over characters (bash substring syntax):

for ((i=0;i<${#word};i++)); do echo ${word:i:1}; done

Source: 大CC – http://www.cnblogs.com/me115/p/3427319.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

shell command-line text processing grep awk find sed

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.