How to Efficiently Clean and Partition a 200M‑Row MySQL Table Using Online DDL
This article explains how to handle a 200‑million‑row MySQL 5.6 table by adding indexes, cleaning 99% of old data, converting it to a partitioned table, and using online DDL and auxiliary tables to maintain continuous business operations.
Business Requirement Overview
The original system used Oracle, but the new requirement is to manage a MySQL 5.6 table containing over 200 million rows. Business users need to keep only the most recent data for an upcoming activity, deleting roughly 99% of old rows while preserving the ability to run statistics every ten minutes and ensuring continuous data ingestion.
Key Tasks Identified
Optimize queries by adding an index on the time‑range column.
Clean up the majority of old (cold) data.
Maintain business sustainability with frequent statistical analysis.
Convert the table to a partitioned layout, separating old and new data so that old partitions can be dropped quickly.
Using MySQL Online DDL
MySQL 5.6 provides robust online DDL capabilities, allowing index creation without blocking reads or writes. For MySQL 5.5, tools like pt‑osc can achieve similar results.
Solution Architecture
1. **Shadow Table**: Create a shadow table serverlog_read that mirrors changes from the source table.
2. **Materialized‑View Emulation**: MySQL lacks native materialized views, so use tools like FlexViews or PT‑OSC which create three triggers (INSERT, UPDATE, DELETE) to keep a view‑like table up‑to‑date.
3. **Auxiliary Tables**: serverlog_par_old – a partitioned table that stores refreshed data from the emulated materialized view. serverlog_host – holds incremental and real‑time data streams.
Data is categorized into:
Cold data (old, to be archived).
Incremental data (e.g., the last month’s records).
Real‑time data (continuously ingested).
By partitioning, old data can be moved to a separate partition and dropped instantly, while new data remains in the active partition.
Additional MySQL Advantages
1. **Table Structure Copy** – MySQL can duplicate a table’s definition efficiently: create table test1 like test; Alternatively, use SHOW CREATE TABLE or mysqldump --no-data to export the DDL.
2. **Data Copy** – Insert data from the original table into the new one with a single statement: insert into test1 select * from test; 3. **Backup & Archiving** – Instead of complex Oracle user‑rename procedures, MySQL can rename the database directory (or use RENAME USER ‑style tricks) to move old data to an archive location quickly and safely.
Conclusion
By leveraging MySQL’s online DDL, partition exchange, and trigger‑based materialized view techniques, the massive 200 million‑row table can be indexed, cleaned, and partitioned with minimal downtime, while still supporting real‑time analytics and continuous data ingestion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
