How I Rescued a MySQL Table After a Massive Bad‑Group Insertion
A Java developer recounts a production MySQL outage caused by a faulty batch insert, walks through failed delete and lock‑free attempts, then details a series of table‑recreation, truncate, and index‑rebuilding steps that finally restored data integrity and performance.
Background
A Java developer assumed MySQL CRUD work was trivial until a production incident exposed a hidden danger: a new bad_group inserted millions of rows via an offline DataWorks job, overwhelming the table and causing query timeouts.
Table Structure
The affected metadata table has columns (id, group, code, name, property1, property2, ...) with id as the primary key and a unique key on group + code. DataWorks uses INSERT IGNORE to avoid duplicate group+code rows.
Initial Problem
On the day of the incident, the new bad_group generated tens of millions of rows, most of which were junk. The immediate goal was to delete all bad_group rows while preserving other data.
Attempt 1 – Direct Delete (v1)
DELETE FROM MY_TABLE WHERE `group` = 'bad_group';This simple DELETE failed because the binlog size limit was exceeded when trying to remove tens of millions of rows.
Attempt 2 – Lock‑Free Change (v2)
The team tried the platform’s “lock‑free” data change feature, but batch execution on such a large dataset took over two hours, which was unacceptable, so the attempt was abandoned.
Attempt 3 – Drop & Recreate (v3)
The next idea was to copy the good rows to a temporary table, drop the original table, and rename the temporary table back. This works because DROP removes the whole table, avoiding row‑by‑row deletion.
-- Copy valid rows to a temporary table
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE `group` <> 'bad_group';
-- Drop the original table
DROP TABLE MY_TABLE;
-- Rename temporary table
RENAME TABLE TEMP_TABLE TO MY_TABLE;The operation succeeded, but it also removed the primary key, unique key, and other indexes.
Issue 4 – Lost Indexes
After the drop‑recreate, queries failed because the primary key and unique key disappeared. The table now behaved like a house that had been burned down: data existed, but no structural constraints remained.
Attempt 4 – Truncate (v4)
To keep the original table definition, the team used TRUNCATE instead of DROP:
-- Copy valid rows to a temporary table
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE `group` <> 'bad_group';
-- Empty the original table but keep its definition
TRUNCATE TABLE MY_TABLE;
-- Insert the good rows back
INSERT INTO MY_TABLE SELECT * FROM TEMP_TABLE;Attempt 5 – CREATE TABLE LIKE (v5)
Another safe method was to create the temporary table with the exact same schema using CREATE TABLE LIKE, then move data:
CREATE TABLE TEMP_TABLE LIKE MY_TABLE;
INSERT INTO TEMP_TABLE SELECT * FROM MY_TABLE WHERE `group` <> 'bad_group';
DROP TABLE MY_TABLE;
RENAME TABLE TEMP_TABLE TO MY_TABLE;All these approaches restored the data but still left the table without its original primary key and unique constraints.
Final Recovery Steps
The remaining tasks were to rebuild the missing indexes and re‑import the rows that had id = 0 after the previous operations.
Copy rows with id = 0 to a temporary table.
Delete those rows from the original table.
Re‑add the primary key, unique key, and any needed secondary indexes.
Insert the saved rows back using INSERT IGNORE.
-- 1. Save id=0 rows
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE id = 0;
-- 2. Remove them from the source
DELETE FROM MY_TABLE WHERE id = 0;
-- 3. Rebuild indexes (example)
ALTER TABLE MY_TABLE ADD PRIMARY KEY (id);
ALTER TABLE MY_TABLE ADD UNIQUE KEY uq_group_code (`group`, code);
-- 4. Re‑insert saved rows
INSERT IGNORE INTO MY_TABLE SELECT * FROM TEMP_TABLE;During verification, the team discovered replication lag between the primary and the DMS‑provided read‑only replica, which initially masked the row‑count changes.
After rebuilding the indexes, the final INSERT IGNORE still failed with a duplicate‑primary‑key error because the AUTO_INCREMENT counter had been reset to zero by the earlier CREATE TABLE AS. Setting the counter manually did not help until the column definition was altered to re‑enable AUTO_INCREMENT:
ALTER TABLE MY_TABLE MODIFY COLUMN `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增ID';With the auto‑increment restored, the data import succeeded and the production service returned to normal.
Takeaways
The root cause was misuse of CREATE TABLE AS, which copies data but drops primary keys, unique keys, and auto‑increment settings. Dropping or truncating a table should be used with caution, and index reconstruction must be part of any recovery plan. The author also added preventive checks on upstream data imports to avoid similar incidents.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
