Operations 8 min read

Quickly Locate Duplicate Files on Linux with Find and dupeGuru

This guide shows three practical ways to identify duplicate files on Linux—using an advanced find‑pipeline, installing the cross‑platform dupeGuru utility, and a step‑by‑step breakdown of each command in the pipeline, complete with code examples and explanations.

Liangxu Linux

Sep 12, 2021

Quickly Locate Duplicate Files on Linux with Find and dupeGuru

Method 1: Using the find command

The find utility can be combined with other core Linux commands (such as xargs) to produce a powerful one‑liner that lists duplicate files by comparing their MD5 hashes.

find -not -empty -type f -printf "%s
" | sort -rn | uniq -d | \
  xargs -I{} -n1 find -type f -size {}c -print0 | \
  xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

find -not -empty -type f -printf "%s
"

enumerates all non‑empty regular files and prints their sizes. sort -rn sorts the sizes in descending numeric order. uniq -d keeps only sizes that appear more than once, i.e., potential duplicates. uniq -w32 --all-repeated=separate compares the first 32 characters of the MD5 hash (the full hash length) and groups identical entries.

Method 2: Using the dupeGuru tool

dupeGuru is a cross‑platform application (Linux, Windows, macOS) that can locate duplicate files based on size, MD5, or filename. On Ubuntu you can install it via a PPA:

sudo add-apt-repository ppa:hsoft/ppa
sudo apt-get update
sudo apt-get install dupeguru*

Method 3: Detailed find‑pipeline explanation

When you need to script duplicate‑file detection, the following expanded pipeline shows each stage and its purpose.

find -not -empty -type f -printf "%sn"
| sort -rn
| uniq -d
| xargs -I{} -n1 find -type f -size {}c -print0
| xargs -0 md5sum
| sort
| uniq -w32 --all-repeated=separate
| cut -b 36-
> result.txt

Explanation of each command: find -not -empty -type f -printf "%sn" outputs the size (in bytes) of every non‑empty regular file. sort -rn sorts those sizes numerically in reverse order. uniq -d filters to sizes that occur more than once. xargs -I{} -n1 find -type f -size {}c -print0 converts each repeated size into a separate find call that lists files of that exact size, using a null terminator to safely handle spaces. xargs -0 md5sum computes the MD5 hash for each listed file. uniq -w32 --all-repeated=separate groups lines with identical first 32 characters (the full MD5 hash) and separates each group. cut -b 36- trims the leading file‑size column, leaving only the filename and hash for readability.

To make the result file Windows‑compatible (convert LF to CRLF), run: cat result.txt | cut -c 36- | tr -s 'n' '\r\n' This pipeline provides a concise, reproducible method for locating duplicate files across a directory tree without installing extra software.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux Shell find dupeguru duplicate files

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.