Databases 8 min read

How to Safely Delete Hundreds of Millions of Rows Without Locking Your Database

This guide explains why a single massive DELETE on a 500‑million‑row table creates long‑running transactions and locks, and shows step‑by‑step techniques—date‑range batching, primary‑key range slicing, and insert‑instead‑of‑delete—to break the operation into manageable chunks and improve performance.

dbaplus Community
dbaplus Community
dbaplus Community
How to Safely Delete Hundreds of Millions of Rows Without Locking Your Database

When you need to delete a large portion of data, such as all rows from 2021 in a 500 million‑row table named yes, executing a single statement like:

delete from yes where create_date > "2020-12-31" and create_date < "2022-01-01";

will likely create a long‑running transaction because it touches hundreds of millions of rows. Long transactions hold locks until they finish, blocking other DML, causing connection stalls, possible service outages, replication lag, and even rollback after hours of work.

Why Long Transactions Are Problematic

Locks are held for the entire transaction, blocking concurrent writes.

Blocked business threads can cascade into service‑wide failures.

Master‑slave replication may fall behind, leading to data inconsistency.

If the transaction aborts, hours of work are lost.

Therefore, the operation must be split into smaller batches.

Simple Date‑Range Batching

One intuitive approach is to split the date range into monthly (or smaller) intervals:

delete from yes where create_date > "2020-12-31" and create_date < "2021-02-01";
delete from yes where create_date >= "2021-02-01" and create_date < "2021-03-01";

...and so on.

This works only if create_date is indexed. Without an index, each batch triggers a full table scan.

Using Primary‑Key Ranges When No Index Exists

If create_date lacks an index, create a surrogate range using the primary key (assumed to be id). First obtain the min and max IDs:

select min(id) from yes;
select max(id) from yes;

Assume the range is 233,333,333 – 666,666,666. Then batch by ID while keeping the original date filter:

delete from yes where (id >= 233333333 and id < 233433333) and create_date > "2020-12-31" and create_date < "2022-01-01";
delete from yes where (id >= 233433333 and id < 233533333) and create_date > "2020-12-31" and create_date < "2022-01-01";

Continue until the upper bound is reached, e.g.:

delete from yes where (id >= 666566666 and id <= 666666666) and create_date > "2020-12-31" and create_date < "2022-01-01";

Each chunk uses the primary‑key index, runs quickly, and only a small portion rolls back on error.

Parallel Execution

After splitting, batches can be run in parallel to further reduce total time, provided the system can handle the increased lock contention. In practice, occasional lock waits may occur but usually finish without timeout.

Alternative Strategy: Turn Deletion into Insertion

When you need to keep only a tiny fraction of a massive table (e.g., retain 20 million rows out of 500 million), consider creating a new table and inserting the desired rows:

Create a new table yes_temp.

Insert the needed rows:

select * into yes_temp from yes where create_date between ...

Rename the original table: rename table yes to yes_233.

Rename the temp table: rename table yes_temp to yes.

This “swap” avoids massive deletes, often completing in minutes instead of hours. Tools like pt-online-schema-change automate similar patterns.

Takeaways

Avoid single‑statement deletes on huge tables; they cause long transactions and lock contention.

Batch by date if an index exists; otherwise batch by primary‑key ranges.

Parallelize safe batches to improve throughput.

When deleting the majority of rows, consider inserting the small retained set into a new table and swapping names.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationSQLLarge Tabletransaction lockingdelete batchingprimary key range
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.