Databases 5 min read

How to Efficiently Remove Duplicate Rows in MySQL Tables

This article explains why a naïve Python script for deleting duplicate MySQL rows is too slow, demonstrates the MySQL error caused by deleting from the same table you query, and provides two pure‑SQL solutions: one that removes all duplicates and another that keeps a single row per duplicate key.

Java Backend Technology
Java Backend Technology
Java Backend Technology
How to Efficiently Remove Duplicate Rows in MySQL Tables

During an on‑call incident we needed to clean duplicate rows from several MySQL tables, some of which contained hundreds of thousands of records. A simple Python script that deleted rows one by one proved too slow, so we switched to pure SQL solutions.

Delete all duplicate rows (no rows kept)

Attempting to delete directly with a sub‑query on the same table causes MySQL error 1093 because the target table is also read in the FROM clause.

DELETE FROM student
WHERE name IN (
    SELECT name FROM (
        SELECT name FROM student GROUP BY name HAVING COUNT(1) > 1
    ) AS t
);

This works by first materialising the list of duplicate names in a derived table.

Delete duplicates while keeping one row per name

First identify the rows to keep – the smallest id for each name – then delete everything whose id is not in that set.

DELETE FROM student
WHERE id NOT IN (
    SELECT t.id FROM (
        SELECT MIN(id) AS id FROM student GROUP BY name
    ) AS t
);

The query runs quickly even on tables with more than 900 000 rows.

All done.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqldata deduplicationdatabase cleanupduplicate removal
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.