Fundamentals 10 min read

30 Essential One‑Liners to Transform and Analyze Text Files with Linux

This guide presents a collection of practical Linux command‑line one‑liners that demonstrate how to join filenames, reverse lines, strip comments, trim characters, compute column statistics, generate DNA reverse complements, merge files, deduplicate, and perform other common text‑processing tasks using tools such as ls, paste, xargs, awk, sed, cut and sort.

ITPUB

Jul 31, 2017

30 Essential One‑Liners to Transform and Analyze Text Files with Linux

1. Join filenames into a single line

List all entries in the current directory and output them as a single comma‑separated line. Three common pipelines are shown:

ls | paste -s -d ','

paste -s -d ','

reads the whole input as one record and inserts commas between the fields.

ls | xargs | sed 's/ /,/g'

xargs

converts whitespace‑separated filenames into a single argument list; sed replaces the spaces with commas. ls | awk '{printf "%s,",$0}' The awk command prints each filename followed by a comma; the trailing comma can be removed with sub(/,$/,"",$0) if required.

2. Reverse the order of lines in a file

Use sed to output a file with its lines in reverse order. The one‑liner stores the first line in the hold space, then repeatedly appends the hold space to the pattern space while deleting the current pattern space, finally printing the accumulated lines.

sed '1!G;h;$!d' file

3. Delete comment lines that start with #

sed '/^#.*/d' test.txt

This command removes any line whose first character is #.

4. Remove the first four characters of each line

cut -c 4- test.csv

cut -c 4-

prints each line starting from the fourth character, effectively discarding the first three characters.

5. Count occurrences of values in the first column

awk -F ',' '{count[$1]++} END{for (v in count) print v, count[v]}' test.csv

The -F ',' option sets the field separator to a comma. An associative array count tallies each distinct value in column 1, and the END block prints the value together with its frequency.

6. Split the fourth column on : and increment the second part

awk -F ',' '{split($4,a,":"); print $1,$2,$3,a[1],a[2]+1}' test.csv

split($4,a,":")

breaks column 4 into an array a using : as delimiter; the script then prints the first three columns, the first part of column 4, and the second part incremented by one.

7. Compute the average of the second column

awk -F ',' '{sum+=$2} END{print "Average =", sum/NR}' test.csv

The script accumulates the sum of column 2 in sum and divides by NR (the number of processed records) to obtain the arithmetic mean.

8. Generate the reverse‑complement of a DNA sequence

cat seq.txt | sed 'y/ATGC/TACG/' | rev

sed 'y/ATGC/TACG/'

performs a transliteration that swaps each nucleotide with its complement; rev then reverses the string, yielding the reverse‑complement.

9. Insert the contents of another file after a specific line

sed '2 r a.txt' test.csv

The r command reads the entire file a.txt and inserts it after line 2 of test.csv.

10. Filter rows whose first‑column value appears in another file

awk -F ',' '{if(NR==FNR){seen[$1]=1}else if(seen[$1]) print}' chr.txt test.csv

The first pass (when NR==FNR) builds an associative array seen of keys from chr.txt. The second pass prints only those rows of test.csv whose first column matches a key.

11. Expand a range defined by the second and third columns into separate rows

Given a file with columns id,start,end,value, the following awk script expands the inclusive range start‑end into individual rows, each containing id, the current position, and value:

awk -F ',' '{for(i=$2;i<=$3;i++) print $1,i,$4}' test.csv

Before expansion the file has four columns; after expansion it has three columns (id, position, value).

12. Merge three files to replace old chromosome coordinates with new ones

The three input files have the same number of lines: Oldpanel_start_end.sort.bed – columns: old_chr, old_start, old_end, ..., ampl_id (column 5). amplGChg19.txt – mapping from ampl_id to new chromosome information (columns 1‑5). hg38amplicon_start_end.bed – contains the new coordinates that will replace the old ones.

The awk program builds two associative arrays: ampl[old_chr,old_start,old_end] = ampl_id from the first file.

Ampl[ampl_id] = sprintf("%s,%d,%d,%s,%s", new_chr, new_start, new_end, col4, col5)

from the second file, where the formatted string stores the new chromosome, start, end and two additional fields.

During the third pass the script looks up the stored string with the old ampl_id, splits it, and prints the original first three columns together with the new chromosome and coordinates.

awk 'BEGIN{FS="\t";OFS="\t"}
{
  if(NR==FNR){
    ampl[$1,$2,$3]=$5; N=NR
  } else if(NR<=2*N){
    Ampl[ampl[$1,$2,$3]]=sprintf("%s,%d,%d,%s,%s",$1,$2,$3,$4,$5)
  } else {
    split(Ampl[$4],a,",");
    print $1,$2,$3,a[4],a[5],$4
  }
}' Oldpanel_start_end.sort.bed amplGChg19.txt hg38amplicon_start_end.bed | sort -k1 > hg38amplicon_Gene_GC.txt

13. Remove duplicates and create a union of two BED files

cat NewpanelGene.bed Oldpanel.gene.bed | sort -u > merge.gene.bed

sort -u

sorts the combined input and eliminates identical lines, producing the union of the two BED sets.

14. Split a FASTA file into separate files based on header lines

Each sequence header (e.g., >chr1) becomes the filename for a new file that contains the corresponding sequence.

awk '/>/{split($0,a,">"); out=a[2]}; {print > out}' test.fa

The script detects lines beginning with >, extracts the identifier after the > character, assigns it to out, and redirects subsequent lines to that file until the next header is encountered.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux command-line awk sed text-processing

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.