Master Powerful Unix One-Liners for File Merging, Stats, and Automation
This guide presents a collection of concise Unix one‑liners that act as Swiss‑army‑knife tools for merging files, computing unions, intersections and differences, summing columns, listing file metadata, and leveraging xargs for flexible batch processing, all with minimal code.
Set Operations on Pre‑deduplicated Text Files
Given two files a and b that already contain unique lines, the following pipelines compute classic set operations efficiently using sort and uniq. sort streams the combined input in lexical order; uniq then collapses adjacent duplicates and can report only repeated or only unique lines.
Union (all lines from both files) cat a b | sort | uniq > c Intersection (lines present in both files) cat a b | sort | uniq -d > c Symmetric difference (lines that appear in exactly one file) cat a b b | sort | uniq -u > c These commands work on arbitrarily large files because sort uses temporary files and does not require the whole dataset in memory. The -T option can specify an alternative temporary directory if needed.
Summing a Numeric Column with awk
To add the values in the third whitespace‑separated column of a text file myfile: awk '{ sum += $3 } END { print sum }' myfile The script accumulates each field $3 into the variable sum and prints the total after the last record. This one‑liner is typically faster and shorter than equivalent Python code.
Listing File Sizes and Modification Times
Instead of a recursive ls -lR, the find command can produce a concise, sortable listing of every regular file together with its size, modification timestamp and other metadata: find . -type f -ls The output format matches that of ls -l but is generated in a single pass and is easier to pipe into further processing tools.
Batch Execution with xargs
xargsreads items from standard input and builds command lines, allowing you to apply a command to many arguments without exceeding the shell’s argument length limit.
Dry‑run test – verify the arguments that would be passed: cat list.txt | xargs echo Search for a function in all Python files – combine find with xargs to avoid spawning a separate grep for each file: find . -name "*.py" | xargs grep some_function Run a command on each host listed in a file – use the -I{} placeholder for per‑item substitution: cat hosts | xargs -I{} ssh root@{} hostname The placeholder {} can be replaced by any token (e.g., -I@@) and the command can be limited with -n to control how many items are passed per invocation.
Counting Parameter Occurrences in Web Logs
To count how many times each acct_id value appears in an access log, chain egrep, cut, sort and uniq -c:
cat access.log \
| egrep -o 'acct_id=[0-9]+' \
| cut -d= -f2 \
| sort \
| uniq -c \
| sort -rnThe pipeline extracts the acct_id=… substrings, isolates the numeric identifier, sorts the identifiers, counts duplicate occurrences with uniq -c, and finally sorts the result numerically in reverse order to show the most frequent IDs first.
Source: http://www.vaikan.com/what-are-the-most-useful-swiss-army-knife-one-liners-on-unix/ (originally summarized from a Quora post by Joshua Levy).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
