Databases 13 min read

Real-World MySQL Time-Based Partitioning Case Study: Table Optimization Process

This article presents a detailed case study of optimizing a massive MySQL table using time‑based partitioning, describing the problem background, partition design, migration scripts, performance benchmarks, and ongoing maintenance procedures for a ten‑year data archive.

Aikesheng Open Source Community

Mar 17, 2022

Real-World MySQL Time-Based Partitioning Case Study: Table Optimization Process

Based on previous articles about time‑based partitioning, this post shares a real‑world case where an internet company optimized a large MySQL table containing ten years of data (approximately 3.6 billion rows) by applying time‑based partitions.

Problem and Background

The table stored 100,000 new rows daily, reaching 3.65 billion rows in ten years.

95% of queries target a single day or a range of up to one month.

Data must be retained for ten years, making the single massive table hard to manage and slow for both queries and deletions.

Analysis and Solution

Because query ranges are concentrated, partitioning can improve performance and simplify data expiration.

Split the original table into ten tables: one current table for the latest two years and nine archive tables for older years.

Use MySQL range‑partitioning on the log_date column for the current table, allowing transparent use without application changes.

Optimization Steps

1. Create a simplified example table pt_old (show create table output omitted for brevity).

2. Export data year‑by‑year using a shell script:

root@ytt-unbuntu:/home/ytt/scripts# cat pt_export
#!/bin/sh
for i in `seq 2011 2020`
do {
  mysql -D ytt -e "select * from pt_old where log_date between '$i-01-01' and '$i-12-31' into outfile '/var/lib/mysql-files/pt_$i.csv' fields terminated by ','"
} &
done
wait

3. Create ten tables pt_2011 … pt_2020 (the latter will become the partitioned table):

for i in `seq 2011 2020`; do mysql -e "use ytt; create table pt_$i like pt_old;"; done;

4. Modify the primary key of pt_2020 to include log_date (required for partition key):

alter table pt_2020 drop primary key, add primary key (id,log_date);

5. Add daily partitions for 2020‑2021 using a stored procedure sp_add_partition_pt_current (full procedure code omitted for brevity).

6. Load the exported CSV files into the corresponding tables with another shell script:

#!/bin/sh
for i in `seq 2011 2020`; do {
  mysql -D ytt -e "load data infile '/var/lib/mysql-files/pt_$i.csv' into table pt_$i fields terminated by ','"
} &
done
wait

7. Rename pt_2020 to pt_current to become the active table.

Performance Verification

Two representative queries were run twice on both the original table ( pt_old) and the partitioned table ( pt_current).

Query for a single day (2020‑03‑01): original first run 1 min 1.7 s, second run 0.03 s; partitioned first run 0.02 s, second run 0.01 s.

Query for the last five days of 2020: original first run 2 min 42.21 s, second run 0.13 s; partitioned first run 0.07 s, second run 0.01 s.

These results show a dramatic reduction in query time, especially for the first execution when data is not cached.

Management and Maintenance

Because MySQL lacks automatic yearly partition creation, a procedure sp_autoextend_partition_pt_current was written to generate partitions for a given year and can be scheduled via OS jobs or MySQL events.

CREATE PROCEDURE sp_autoextend_partition_pt_current(IN f_year YEAR) ... END$$

After extending partitions for 2022, data for the previous year is moved to its archive table and old partitions are dropped using generated SQL scripts.

Key Takeaways

Time‑based partitioning greatly improves query performance for range‑limited workloads.

Partitioning simplifies data lifecycle management, allowing fast deletion of expired data.

Proper partition naming and automation scripts are essential for ongoing maintenance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Database Optimization Partitioning data archiving

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.