How to Efficiently Insert Massive Data Without Duplicates in MySQL
This article explains several MySQL techniques—INSERT IGNORE, ON DUPLICATE KEY UPDATE, INSERT…SELECT WHERE NOT EXISTS, and REPLACE—plus a practical MyBatis batch‑insert example, to handle large‑scale data imports while preventing duplicate records.
The task is simple: batch insert data that may come from other database tables or an external Excel file.
When inserting millions of rows, checking each row for duplicates beforehand is impractical, so efficient SQL strategies are needed.
1. INSERT IGNORE
When an insert causes a duplicate‑key error, the statement is ignored and only a warning is returned. Use this only if you are sure the statement itself is correct.
INSERT IGNORE INTO user (name) VALUES ('telami');This method is simple, but any error (not just duplicate‑key) will also be ignored.
2. ON DUPLICATE KEY UPDATE
If a primary or unique key conflict occurs, the UPDATE clause runs; using a no‑op update like id=id mimics INSERT IGNORE while still reporting other errors.
INSERT INTO user (name) VALUES ('telami') ON duplicate KEY UPDATE id = id;The prerequisite is that the column used for duplicate detection must have a PRIMARY or UNIQUE constraint.
3. INSERT … SELECT … WHERE NOT EXISTS
Insert based on a SELECT that only proceeds when a matching row does not already exist, allowing more complex conditions.
INSERT INTO user (name) SELECT 'telami' FROM dual WHERE NOT EXISTS (SELECT id FROM user WHERE id = 1);This approach uses a sub‑query and may be slightly slower than the previous methods.
4. REPLACE INTO
If a row with the same PRIMARY or UNIQUE key exists, it is deleted first, then the new row is inserted.
REPLACE INTO user SELECT 1, 'telami' FROM books;This always deletes any existing matching row before inserting the new one.
Practical Example
The author chose the second method (ON DUPLICATE KEY UPDATE) and implemented it with MyBatis for batch insertion, where mobile_number has a UNIQUE constraint. The MyBatis XML snippet:
<insert id="batchSaveUser" parameterType="list">
insert into user (id,username,mobile_number)
values
<foreach collection="list" item="item" index="index" separator=",">
(#{item.id}, #{item.username}, #{item.mobileNumber})
</foreach>
ON duplicate KEY UPDATE id = id
</insert>With this configuration, rows with duplicate mobile numbers are automatically ignored during batch insertion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
