Fundamentals 19 min read

Master Essential Linux Shell Text Processing Tools: find, grep, awk, and More

This article provides a comprehensive guide to the most frequently used Linux shell text‑processing utilities—find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—offering practical examples, command‑line options, and tips for efficient one‑ or two‑line scripts.

Efficient Ops
Efficient Ops
Efficient Ops
Master Essential Linux Shell Text Processing Tools: find, grep, awk, and More

Linux Shell is a fundamental skill; although its syntax can be quirky and readability poor, it is often replaced by scripts such as Python. Because it is a basic competency, mastering it is important, as learning Shell scripts also reveals many aspects of the Linux system.

Becoming a Linux scripting master is not easy for everyone, but using simple Shell commands to achieve common basic functions is still necessary.

Below is an introduction to the most commonly used tools for text processing in Linux: find , grep , xargs , sort , uniq , tr , cut , paste , wc , sed , awk . The examples and parameters shown are the most practical; the principle is to write commands in a single line, preferably not exceeding two lines. For more complex tasks, consider Python.

1. find – File Search

Search for txt and pdf files:

<code>find . ( -name "*.txt" -o -name "*.pdf" ) -print</code>

Search using regular expressions for .txt and .pdf:

<code>find . -regex ".*(.txt|.pdf)$"</code>

Use

-iregex

to ignore case.

Negate pattern – find all non‑txt files:

<code>find . ! -name "*.txt" -print</code>

Specify search depth – list files in the current directory (depth 1):

<code>find . -maxdepth 1 -type f</code>

Custom Search

Search by type (list only directories):

<code>find . -type d -print</code>

Search by time:

-atime

– access time (days; use

-amin

for minutes)

-mtime

– modification time

-ctime

– change time (metadata or permission changes)

Files accessed in the last 7 days:

<code>find . -atime 7 -type f -print</code>

Search by size (k, M, G). Find files larger than 2 kB:

<code>find . -type f -size +2k</code>

Search by permission (e.g., find files with permission 644):

<code>find . -type f -perm 644 -print</code>

Search by user:

<code>find . -type f -user weber -print</code>

Post‑Search Actions

Delete all *.swp files in the current directory:

<code>find . -type f -name "*.swp" -delete</code>

Execute a command on each match (powerful

-exec

):

<code>find . -type f -user root -exec chown weber {} \;</code>

Note:

{}

is replaced by the current file name. Example – copy found files to another directory:

<code>find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;</code>

Combine multiple commands by writing a script and invoking it with

-exec

:

<code>-exec ./commands.sh {} \;</code>

Print Delimiters

By default

-print

uses a newline as the delimiter.

-print0

uses a null character, allowing handling of filenames containing spaces.

2. grep – Text Search

Basic usage:

<code>grep -c "text" filename</code>

Common options:

-o

– output only the matching part

-v

– output lines that do NOT match

-c

– count matching lines

-n

– print line numbers

-i

– ignore case

-l

– print only file names

Recursive search in multi‑level directories (a programmer’s favorite):

<code>grep "class" . -R -n</code>

Match multiple patterns:

<code>grep -e "class" -e "virtual" file</code>

Use

-Z

to output file names with a null terminator, then delete them with

xargs -0

:

<code>grep "test" file* -lZ | xargs -0 rm</code>

3. xargs – Convert Input to Command‑Line Arguments

xargs

transforms input data into arguments for other commands, useful with

grep

,

find

, etc.

Convert multi‑line output to a single line:

<code>cat file.txt | xargs</code>

Convert a single line to multiple lines (e.g., three arguments per line):

<code>cat single.txt | xargs -n 3</code>

xargs Options

-d

– define delimiter (default is space; newline is

)

-n

– specify number of arguments per command line

-I {}

– replace placeholder with the input item

-0

– input delimiter is null character

Example – count lines in C source files:

<code>find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l</code>

4. sort – Sorting

Key options:

-n

– numeric sort (vs.

-d

dictionary order)

-r

– reverse order

-k N

– sort by the N‑th column

Example:

<code>sort -nrk 1 data.txt</code>

Ignore leading blanks:

<code>sort -bd data</code>

5. uniq – Remove Duplicate Lines

Remove duplicate lines:

<code>sort unsort.txt | uniq</code>

Count occurrences of each line:

<code>sort unsort.txt | uniq -c</code>

Show only duplicated lines:

<code>sort unsort.txt | uniq -d</code>

Specify start position and width with

-s

and

-w

.

6. tr – Translate or Delete Characters

General usage:

<code>echo 12345 | tr '0-9' '9876543210'   # simple substitution</code>
<code>cat text | tr '\t' ' '</code>

Delete characters:

<code>cat file | tr -d '0-9'   # delete all digits</code>

Complement set (

-c

) to keep only matching characters:

<code>cat file | tr -c '0-9'          # keep only digits</code>
<code>cat file | tr -d -c '0-9'      # delete non‑digits</code>

Compress repeated characters (useful for collapsing spaces):

<code>cat file | tr -s ' '</code>

Character classes (e.g.,

[:lower:]

,

[:upper:]

,

[:digit:]

, etc.) can be used as:

<code>tr '[:lower:]' '[:upper:]'</code>

7. cut – Extract Columns

Extract the 2nd and 4th columns:

<code>cut -f2,4 filename</code>

Exclude the 3rd column:

<code>cut -f3 --complement filename</code>

Specify delimiter (e.g., semicolon):

<code>cut -f2 -d ";" filename</code>

Range specifications:

N-

– from field N to the end

-M

– from the first field to M

N-M

– fields N through M

Units:

-b

– bytes

-c

– characters

-f

– fields (using the delimiter)

Example – print first five characters:

<code>cut -c1-5 file</code>

Example – print first two characters:

<code>cut -c-2 file</code>

8. paste – Merge Files Column‑wise

Combine two files column‑wise:

<code>paste file1 file2</code>

Default delimiter is a tab; you can set a custom delimiter with

-d

:

<code>paste file1 file2 -d ","</code>

9. wc – Word, Line, and Byte Count

Count lines:

<code>wc -l file</code>

Count words:

<code>wc -w file</code>

Count bytes:

<code>wc -c file</code>

10. sed – Stream Editor for Text Substitution

Replace the first occurrence on each line:

<code>sed 's/text/replace_text/' file</code>

Global replacement:

<code>sed 's/text/replace_text/g' file</code>

Edit file in place:

<code>sed -i 's/text/replace_text/g' file</code>

Delete empty lines:

<code>sed '/^$/d' file</code>

Use

&amp;

to reference the matched string:

<code>echo "this is an example" | sed 's/w+/[&]/g'</code>

Capture groups with parentheses and reference them:

<code>sed 's/hello\([0-9]\)//'</code>

Double‑quoted expressions allow variable expansion:

<code>p=pattern; r=replace; echo "line contains pattern" | sed "s/$p/$r/g"</code>

11. awk – Data‑Stream Processing Tool

Basic script structure:

<code>awk 'BEGIN{ statements } statements END{ statements }' file</code>

Print the current line:

<code>awk '{print}' file</code>

Print specific fields:

<code>awk '{print $2, $3}' file</code>

Count lines:

<code>awk 'END{print NR}' file</code>

Sum the first column:

<code>awk 'BEGIN{sum=0} {sum+=$1} END{print sum}' file</code>

Pass external variables:

<code>var=1000; awk '{print $0}' var=$var file</code>

Filter by line number or pattern:

<code>awk 'NR<5' file</code>
<code>awk '/linux/' file</code>

Set field delimiter with

-F

:

<code>awk -F: '{print $NF}' /etc/passwd</code>

Read command output with

getline

:

<code>awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'</code>

Loop constructs:

<code>for(i=0;i<10;i++){print i}</code>
<code>for(i in array){print array[i]}</code>

Implement

head

and

tail

:

<code>awk 'NR<=10{print}' filename   # head</code>
<code>awk '{buffer[NR%10]=$0} END{for(i=0;i<10;i++) print buffer[i]}' filename   # tail</code>

Print specific columns using

awk

or

cut

:

<code>ls -lrt | awk '{print $6}'</code>
<code>ls -lrt | cut -f6</code>

12. Iterating Over Lines, Words, and Characters

Iterate Over Each Line

<code>while read line; do echo $line; done < file.txt</code>
<code>cat file.txt | while read line; do echo $line; done</code>
<code>cat file.txt | awk '{print}'</code>

Iterate Over Each Word in a Line

<code>for word in $line; do echo $word; done</code>

Iterate Over Each Character

<code>for ((i=0;i<${#word};i++)); do echo ${word:i:1}; done</code>
Source: 大CC Link: http://www.cnblogs.com/me115/p/3427319.html
LinuxShellCommand-linetext processinggrepawkfind
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.