Databases 8 min read

How to Efficiently Delete Massive MySQL Logs While Preserving Critical Types

This article explains how to clean up a rapidly growing MySQL log table by batching deletions, handling unindexed type columns, and using a start‑id based pagination strategy that scales from tens of millions to billions of rows without locking the database.

ITPUB

Oct 11, 2024

How to Efficiently Delete Massive MySQL Logs While Preserving Critical Types

Background

A MySQL table named log stores user operation logs. Each user can have many rows, so the table grows continuously. The goal is to delete old data while keeping the most recent three months and preserving rows whose type equals c.

Relevant columns:

id (primary key)

type (no index, values a‑e, cannot delete when type = c)

datachange_lasttime (indexed timestamp)

Deletion conditions:

datachange_lasttime <= current_time - 3 months

type != c

Early Approach

The initial solution performed batch deletions directly in SQL, selecting IDs that satisfy both conditions and deleting them in chunks.

select id from log
where
    datachange_lasttime <= '2023-06-17 00:00:00'
    and type != 'c'
limit #{limit}

Each iteration fetched a limited number of IDs, deleted them, and repeated until no rows matched.

Failed Optimization

When the table reached hundreds of millions of rows, the above query became a bottleneck because type has no index. The solution moved the type filter to Java code and kept only the timestamp condition in SQL, but still used MyBatis PageHelper for pagination, which produced duplicate or missing rows.

select id from t_user_pop_log
order by id
limit #{offset}, #{limit}

Example of the problem:

First query returns 100 rows, after filtering only 50 are deleted, leaving 250 rows.

Second query returns another 100 rows, 60 are deleted, leaving 190 rows.

Third query attempts to fetch rows with offset 201, but only 190 rows exist, so no rows are returned and some data remain undeleted.

Continuously querying the first page (offset = 0) until it returns no rows seemed a fix, but if an entire page consists of type = c rows, the loop would never terminate.

Successful Optimization

The root cause was the incorrect offset calculation of PageHelper. Replacing it with a start‑id based pagination eliminates the offset issue and keeps the query fast because it uses the primary key index.

select *
from t_user_pop_log
where id >= #{startId}
order by id
limit #{limit}

Processing steps:

Initialize startId to 1 for the first run.

Fetch a batch of rows ordered by id.

In application code, filter rows where type != c and datachange_lasttime <= current_time - 3 months, collecting the IDs to delete.

Delete the collected IDs in a batch.

Set startId to the maximum id from the current batch plus one.

Stop when the first row of the batch has datachange_lasttime newer than the 3‑month threshold.

For subsequent monthly runs, only the most recent month needs scanning because earlier data have already been removed. The initial run may require a larger scan; adding a short sleep between batches can prevent overwhelming the DB.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL pagination Large Tables sql-optimization data deletion

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.