30 Essential One‑Liners to Transform and Analyze Text Files with Linux
This guide presents a collection of practical Linux command‑line one‑liners that demonstrate how to join filenames, reverse lines, strip comments, trim characters, compute column statistics, generate DNA reverse complements, merge files, deduplicate, and perform other common text‑processing tasks using tools such as ls, paste, xargs, awk, sed, cut and sort.
1. Join filenames into a single line
List all entries in the current directory and output them as a single comma‑separated line. Three common pipelines are shown:
ls | paste -s -d ',' paste -s -d ','reads the whole input as one record and inserts commas between the fields.
ls | xargs | sed 's/ /,/g' xargsconverts whitespace‑separated filenames into a single argument list; sed replaces the spaces with commas. ls | awk '{printf "%s,",$0}' The awk command prints each filename followed by a comma; the trailing comma can be removed with sub(/,$/,"",$0) if required.
2. Reverse the order of lines in a file
Use sed to output a file with its lines in reverse order. The one‑liner stores the first line in the hold space, then repeatedly appends the hold space to the pattern space while deleting the current pattern space, finally printing the accumulated lines.
sed '1!G;h;$!d' file3. Delete comment lines that start with #
sed '/^#.*/d' test.txtThis command removes any line whose first character is #.
4. Remove the first four characters of each line
cut -c 4- test.csv cut -c 4-prints each line starting from the fourth character, effectively discarding the first three characters.
5. Count occurrences of values in the first column
awk -F ',' '{count[$1]++} END{for (v in count) print v, count[v]}' test.csvThe -F ',' option sets the field separator to a comma. An associative array count tallies each distinct value in column 1, and the END block prints the value together with its frequency.
6. Split the fourth column on : and increment the second part
awk -F ',' '{split($4,a,":"); print $1,$2,$3,a[1],a[2]+1}' test.csv split($4,a,":")breaks column 4 into an array a using : as delimiter; the script then prints the first three columns, the first part of column 4, and the second part incremented by one.
7. Compute the average of the second column
awk -F ',' '{sum+=$2} END{print "Average =", sum/NR}' test.csvThe script accumulates the sum of column 2 in sum and divides by NR (the number of processed records) to obtain the arithmetic mean.
8. Generate the reverse‑complement of a DNA sequence
cat seq.txt | sed 'y/ATGC/TACG/' | rev sed 'y/ATGC/TACG/'performs a transliteration that swaps each nucleotide with its complement; rev then reverses the string, yielding the reverse‑complement.
9. Insert the contents of another file after a specific line
sed '2 r a.txt' test.csvThe r command reads the entire file a.txt and inserts it after line 2 of test.csv.
10. Filter rows whose first‑column value appears in another file
awk -F ',' '{if(NR==FNR){seen[$1]=1}else if(seen[$1]) print}' chr.txt test.csvThe first pass (when NR==FNR) builds an associative array seen of keys from chr.txt. The second pass prints only those rows of test.csv whose first column matches a key.
11. Expand a range defined by the second and third columns into separate rows
Given a file with columns id,start,end,value, the following awk script expands the inclusive range start‑end into individual rows, each containing id, the current position, and value:
awk -F ',' '{for(i=$2;i<=$3;i++) print $1,i,$4}' test.csvBefore expansion the file has four columns; after expansion it has three columns (id, position, value).
12. Merge three files to replace old chromosome coordinates with new ones
The three input files have the same number of lines: Oldpanel_start_end.sort.bed – columns: old_chr, old_start, old_end, ..., ampl_id (column 5). amplGChg19.txt – mapping from ampl_id to new chromosome information (columns 1‑5). hg38amplicon_start_end.bed – contains the new coordinates that will replace the old ones.
The awk program builds two associative arrays: ampl[old_chr,old_start,old_end] = ampl_id from the first file.
Ampl[ampl_id] = sprintf("%s,%d,%d,%s,%s", new_chr, new_start, new_end, col4, col5)from the second file, where the formatted string stores the new chromosome, start, end and two additional fields.
During the third pass the script looks up the stored string with the old ampl_id, splits it, and prints the original first three columns together with the new chromosome and coordinates.
awk 'BEGIN{FS="\t";OFS="\t"}
{
if(NR==FNR){
ampl[$1,$2,$3]=$5; N=NR
} else if(NR<=2*N){
Ampl[ampl[$1,$2,$3]]=sprintf("%s,%d,%d,%s,%s",$1,$2,$3,$4,$5)
} else {
split(Ampl[$4],a,",");
print $1,$2,$3,a[4],a[5],$4
}
}' Oldpanel_start_end.sort.bed amplGChg19.txt hg38amplicon_start_end.bed | sort -k1 > hg38amplicon_Gene_GC.txt13. Remove duplicates and create a union of two BED files
cat NewpanelGene.bed Oldpanel.gene.bed | sort -u > merge.gene.bed sort -usorts the combined input and eliminates identical lines, producing the union of the two BED sets.
14. Split a FASTA file into separate files based on header lines
Each sequence header (e.g., >chr1) becomes the filename for a new file that contains the corresponding sequence.
awk '/>/{split($0,a,">"); out=a[2]}; {print > out}' test.faThe script detects lines beginning with >, extracts the identifier after the > character, assigns it to out, and redirects subsequent lines to that file until the next header is encountered.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
