Operations 6 min read

Master Powerful Unix One-Liners for File Merging, Stats, and Automation

This guide presents a collection of concise Unix one‑liners that act as Swiss‑army‑knife tools for merging files, computing unions, intersections and differences, summing columns, listing file metadata, and leveraging xargs for flexible batch processing, all with minimal code.

ITPUB
ITPUB
ITPUB
Master Powerful Unix One-Liners for File Merging, Stats, and Automation

Set Operations on Pre‑deduplicated Text Files

Given two files a and b that already contain unique lines, the following pipelines compute classic set operations efficiently using sort and uniq. sort streams the combined input in lexical order; uniq then collapses adjacent duplicates and can report only repeated or only unique lines.

Union (all lines from both files) cat a b | sort | uniq > c Intersection (lines present in both files) cat a b | sort | uniq -d > c Symmetric difference (lines that appear in exactly one file) cat a b b | sort | uniq -u > c These commands work on arbitrarily large files because sort uses temporary files and does not require the whole dataset in memory. The -T option can specify an alternative temporary directory if needed.

Summing a Numeric Column with awk

To add the values in the third whitespace‑separated column of a text file myfile: awk '{ sum += $3 } END { print sum }' myfile The script accumulates each field $3 into the variable sum and prints the total after the last record. This one‑liner is typically faster and shorter than equivalent Python code.

Listing File Sizes and Modification Times

Instead of a recursive ls -lR, the find command can produce a concise, sortable listing of every regular file together with its size, modification timestamp and other metadata: find . -type f -ls The output format matches that of ls -l but is generated in a single pass and is easier to pipe into further processing tools.

Batch Execution with xargs

xargs

reads items from standard input and builds command lines, allowing you to apply a command to many arguments without exceeding the shell’s argument length limit.

Dry‑run test – verify the arguments that would be passed: cat list.txt | xargs echo Search for a function in all Python files – combine find with xargs to avoid spawning a separate grep for each file: find . -name "*.py" | xargs grep some_function Run a command on each host listed in a file – use the -I{} placeholder for per‑item substitution: cat hosts | xargs -I{} ssh root@{} hostname The placeholder {} can be replaced by any token (e.g., -I@@) and the command can be limited with -n to control how many items are passed per invocation.

Counting Parameter Occurrences in Web Logs

To count how many times each acct_id value appears in an access log, chain egrep, cut, sort and uniq -c:

cat access.log \
| egrep -o 'acct_id=[0-9]+' \
| cut -d= -f2 \
| sort \
| uniq -c \
| sort -rn

The pipeline extracts the acct_id=… substrings, isolates the numeric identifier, sorts the identifiers, counts duplicate occurrences with uniq -c, and finally sorts the result numerically in reverse order to show the most frequent IDs first.

Source: http://www.vaikan.com/what-are-the-most-useful-swiss-army-knife-one-liners-on-unix/ (originally summarized from a Quora post by Joshua Levy).
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shellUnixFile ProcessingOne-liners
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.