Operations 7 min read

Using Find Command and Tools to Detect Duplicate Files on Linux

This article explains three methods for locating duplicate files on Linux—using a complex find‑pipeline, installing the cross‑platform dupeGuru tool, and a step‑by‑step breakdown of each command in the pipeline—complete with code snippets and usage tips.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
Using Find Command and Tools to Detect Duplicate Files on Linux

Method 1 demonstrates a single pipeline that combines find , sort , uniq , xargs , and md5sum to list duplicate files by comparing their MD5 hashes.

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

The pipeline works as follows:

find -not -empty -type f -printf "%s\n" lists the sizes of all non‑empty regular files.

sort -rn sorts the sizes in descending order.

uniq -d keeps only the duplicated size values.

xargs -I{} -n1 find -type f -size {}c -print0 retrieves the file names that match each duplicated size, using a null delimiter to handle spaces.

xargs -0 md5sum computes the MD5 checksum for each of those files.

sort orders the checksums, and uniq -w32 --all-repeated=separate groups files with identical 32‑character MD5 prefixes, effectively identifying duplicates.

Method 2 introduces dupeGuru , a cross‑platform GUI tool that can find duplicate files by size, MD5, or name. It can be installed on Ubuntu via the following commands:

sudo add-apt-repository ppa:hsoft/ppa
sudo apt-get update
sudo apt-get install dupeguru*

Method 3 revisits the find‑pipeline, providing a detailed explanation of each command segment and adding a final cut -b 36- to strip the MD5 hash from the output for readability.

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -b 36-

The complete pipeline can be redirected to result.txt for later inspection:

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -b 36- > result.txt

Because Linux line endings are LF (\n) while Windows expects CRLF (\r\n), the article suggests converting the line endings with:

cat result.txt | cut -c 36- | tr -s '\n' '\r\n'

Overall, the guide provides practical command‑line techniques and a GUI alternative for efficiently locating duplicate files in Linux environments.

linuxShellCommand-linefinddupeGuruDuplicate Filesmd5sum
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.