How to Remove Duplicate MySQL Records with a Single Fast SQL Query
This article walks through the problem of duplicate rows in a MySQL table, shows an initial complex SQL attempt, compares a slow PHP‑based cleanup, and finally presents a concise, three‑step DELETE statement that eliminates duplicates in under a second.
Demand Analysis
The author needed to clean duplicate records in a MySQL table, where duplicates are defined by a combination of several fields. The goal was to keep only one row per duplicate group.
Initial SQL Attempt
After consulting others, a multi‑step SQL was found. It first selects duplicate keys, then finds the minimum rowid for each group, and finally deletes rows whose rowid is not the minimum. The full statement is:
DELETE FROM vitae a
WHERE (a.peopleId, a.seq) IN (
SELECT peopleId, seq
FROM vitae
GROUP BY peopleId, seq
HAVING count(*) > 1
)
AND rowid NOT IN (
SELECT min(rowid)
FROM vitae
GROUP BY peopleId, seq
HAVING count(*) > 1
);The logic consists of three steps:
Identify duplicate peopleId, seq pairs.
Find the smallest rowid for each duplicate group.
Delete all rows that are not the smallest rowid.
Running this query produced an error because MySQL does not allow updating the same table you are selecting from.
Code‑Based Workaround
The author then wrote PHP code that first fetched the duplicate rows and subsequently looped to delete the extra records. This approach took about 116 seconds on the test data, prompting a search for a faster pure‑SQL solution.
Perfect SQL Solution
A refined DELETE statement was shared in a technical group, which solves the problem in a single pass and runs in roughly 0.3 seconds:
DELETE consum_record
FROM consum_record,
(SELECT min(id) id, user_id, monetary, consume_time
FROM consum_record
GROUP BY user_id, monetary, consume_time
HAVING count(*) > 1) t2
WHERE consum_record.user_id = t2.user_id
AND consum_record.monetary = t2.monetary
AND consum_record.consume_time = t2.consume_time
AND consum_record.id > t2.id;This query also follows three logical steps:
Build a temporary set t2 containing the smallest id for each duplicate group.
Join the original table with t2 on the fields that define duplication.
Delete rows whose id is greater than the minimum id in the group.
The performance comparison highlighted a dramatic improvement: the PHP loop needed ~116 seconds, while the single‑SQL version completed in ~0.3 seconds.
Conclusion
The author reflects that, as a PHP developer, relying on SQL for data‑cleanup can be far more efficient, and plans to deepen SQL expertise to avoid such performance pitfalls in the future.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
