Optimizing and Migrating a 20 Million‑Row MySQL Table
This guide explains how to clean redundant fields, restructure a massive MySQL table, batch‑fetch data with LIMIT, rewrite slow queries, and compare three insertion methods—including prepared statements and bulk inserts—to cut migration time from hours to under 20 minutes.
1. Clean Redundant Data and Optimize Schema
The original table holds about 20 million rows with many redundant and erroneous columns, which hampers statistical analysis. Columns that are frequently used in query conditions are extracted into dedicated fields (e.g., storing an ID card number as VARCHAR(18)). Less important information is merged into a single TEXT column. Derived attributes such as gender, birthdate, and age are calculated from the ID number and stored as separate columns to simplify downstream queries.
2. Data Migration Strategy
Data is read from the old table row by row, transformed, and written to a new table. Because the dataset cannot be loaded into memory at once, the migration fetches rows in batches using MySQL LIMIT. An initial query looks like: SELECT * FROM table_name LIMIT 15000000,50000; When the offset grows, the query becomes slow (≈35 seconds). Rewriting the query to use a descending order on the auto‑increment id column and a range condition dramatically improves performance:
SELECT * FROM table_name ORDER BY id DESC LIMIT 5000000,50000;A binary‑search‑style split of the id range reduces the execution time of each batch from 35 seconds to about 9 seconds, making the overall migration feasible.
3. Insertion Methods
Three insertion strategies were evaluated:
Row‑by‑row insertion with prepared statements . This method detects problematic rows early and prevents SQL injection. Example PHP code:
public function actionTest(array $data){
$mysqli = new mysqli("192.168.1.106", "username", "password", "test");
$sql = "INSERT INTO table_name(name,identity) VALUES (?,?)";
$stmt = $mysqli->prepare($sql);
foreach($data as $val){
$name = $val['name'];
$identity = $val['card_id'];
$stmt->bind_param("si", $name, $identity);
$stmt->execute();
}
$stmt->close();
}While secure, this approach does not significantly speed up the migration.
Bulk insertion of 50 000 rows per batch . The script accumulates 50 000 rows, builds a multi‑value INSERT statement, and executes it. This balances error detection (a failed batch can be rolled back) with high throughput, and it proved to be the most practical solution.
Generate a large SQL dump and import with MySQL tools . The script writes all INSERT statements to a file (often several gigabytes) and loads it using mysql or mysqlimport. This method is fastest for clean data but makes locating and fixing a single erroneous row cumbersome.
4. Results
By cleaning the schema, using the optimized descending‑id range query, and adopting bulk 50 000‑row inserts, the migration of 20 million rows completed in under 20 minutes with improved data quality. The same workflow is planned to be tested on a 200 million‑row dataset.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
