Databases 6 min read

How to Remove Duplicate MySQL Records with a Single Fast SQL Query

This article walks through the problem of duplicate rows in a MySQL table, shows an initial complex SQL attempt, compares a slow PHP‑based cleanup, and finally presents a concise, three‑step DELETE statement that eliminates duplicates in under a second.

ITPUB
ITPUB
ITPUB
How to Remove Duplicate MySQL Records with a Single Fast SQL Query

Demand Analysis

The author needed to clean duplicate records in a MySQL table, where duplicates are defined by a combination of several fields. The goal was to keep only one row per duplicate group.

Initial SQL Attempt

After consulting others, a multi‑step SQL was found. It first selects duplicate keys, then finds the minimum rowid for each group, and finally deletes rows whose rowid is not the minimum. The full statement is:

DELETE FROM vitae a
WHERE (a.peopleId, a.seq) IN (
    SELECT peopleId, seq
    FROM vitae
    GROUP BY peopleId, seq
    HAVING count(*) > 1
)
AND rowid NOT IN (
    SELECT min(rowid)
    FROM vitae
    GROUP BY peopleId, seq
    HAVING count(*) > 1
);

The logic consists of three steps:

Identify duplicate peopleId, seq pairs.

Find the smallest rowid for each duplicate group.

Delete all rows that are not the smallest rowid.

Running this query produced an error because MySQL does not allow updating the same table you are selecting from.

Code‑Based Workaround

The author then wrote PHP code that first fetched the duplicate rows and subsequently looped to delete the extra records. This approach took about 116 seconds on the test data, prompting a search for a faster pure‑SQL solution.

Perfect SQL Solution

A refined DELETE statement was shared in a technical group, which solves the problem in a single pass and runs in roughly 0.3 seconds:

DELETE consum_record
FROM consum_record,
     (SELECT min(id) id, user_id, monetary, consume_time
      FROM consum_record
      GROUP BY user_id, monetary, consume_time
      HAVING count(*) > 1) t2
WHERE consum_record.user_id = t2.user_id
  AND consum_record.monetary = t2.monetary
  AND consum_record.consume_time = t2.consume_time
  AND consum_record.id > t2.id;

This query also follows three logical steps:

Build a temporary set t2 containing the smallest id for each duplicate group.

Join the original table with t2 on the fields that define duplication.

Delete rows whose id is greater than the minimum id in the group.

The performance comparison highlighted a dramatic improvement: the PHP loop needed ~116 seconds, while the single‑SQL version completed in ~0.3 seconds.

Conclusion

The author reflects that, as a PHP developer, relying on SQL for data‑cleanup can be far more efficient, and plans to deepen SQL expertise to avoid such performance pitfalls in the future.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationSQLmysqldata cleanupduplicate records
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.