How to Build a Python Script that Detects and Removes Duplicate Files
This tutorial walks through creating a Python automation script that scans a given directory, uses the os, glob, and filecmp modules to identify duplicate files, and safely deletes the redundant copies while handling edge cases such as missing files.
Introduction
Hello everyone, it's time for the Python office automation series. This article presents a system‑level automation case: given a folder, use Python to check for duplicate files and delete them.
Key Modules
os– comprehensive usage glob – comprehensive usage filecmp – compare two files
Step Analysis
The program traverses all files in the target folder, compares each pair, and deletes the latter when duplicates are found.
Traverse the folder, compare files pairwise, delete duplicates.
The crucial question is how to determine if two files are identical. The filecmp module provides filecmp.cmp(f1, f2, shallow=True), which returns True if files are considered equal (based on os.stat() when shallow is true) or False otherwise.
# Assume x and y are identical files
print(filecmp.cmp(x, y))
# TruePython Implementation
Import libraries and set the target directory:
import os
import glob
import filecmp
dir_path = r'C:\xxxx'Collect absolute paths of all files using glob with recursive=True:
for file in glob.glob(dir_path + '/**/*', recursive=True):
passBuild a list of file paths:
file_lst = []
for i in glob.glob(dir_path + '/**/*', recursive=True):
if os.path.isfile(i):
file_lst.append(i)Compare each pair with filecmp.cmp and delete duplicates, checking existence to avoid errors:
for x in file_lst:
for y in file_lst:
if x != y and os.path.exists(x) and os.path.exists(y):
if filecmp.cmp(x, y):
os.remove(y)The complete script combines the above steps.
Conclusion
This simple duplicate‑file remover demonstrates the power of Python for office automation and can be combined with other file‑organizing scripts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
