Databases 8 min read

How to Safely Update Billions of MySQL Rows Without Overloading Binlog

This article explains why a naïve full‑table UPDATE on massive MySQL tables can cripple replication, explores deep‑pagination and IN‑clause inefficiencies, and presents a batch‑processing strategy using NO_CACHE, FORCE INDEX, and rate‑controlled scripts to perform safe, high‑performance updates.

Architect's Guide
Architect's Guide
Architect's Guide
How to Safely Update Billions of MySQL Rows Without Overloading Binlog

When a business needs to modify data across a very large MySQL table (hundreds of millions to billions of rows), a simple UPDATE statement quickly becomes impractical because the master‑slave replication relies on binlog entries, which can explode in size and block the replica.

Binlog Formats

statement : records the original SQL on the master; low log volume but functions like RAND() may cause inconsistencies.

row : records the exact row changes; can generate huge logs for bulk DELETE / UPDATE operations.

mixed : combines both; usually statement for simple statements and row for functions.

Problem with Direct UPDATE

In a real case the team needed to replace http:// with https:// in the user_img column of a table containing tens of millions of rows. Executing a single

UPDATE tb_user_info SET user_img=replace(user_img,'http://','https://')

generated massive binlog traffic, overwhelming the replica and risking replication lag.

Deep Pagination Issue

Attempting to batch the work with LIMIT offsets, e.g.

update tb_user_info set user_img=replace(user_img,'http://','https://') limit 1,1000;

fails for large offsets because MySQL must traverse the B+‑tree leaf nodes from the beginning, effectively performing a full‑table scan for each batch.

Inefficiency of IN Clause

Fetching a list of IDs and then updating with an IN condition also proves slow, even though MySQL can use some index prediction.

select * from tb_user_info where id> {index} limit 100;
update tb_user_info set user_img=replace(user_img,'http','https') where id in (id1,id2,...);

Final Batch‑Update Strategy

The solution combines three MySQL hints and careful ordering: /*!40001 SQL_NO_CACHE */ forces the query not to use the InnoDB buffer pool, keeping cold data out of the cache. FORCE INDEX(PRIMARY) guarantees the primary‑key index is used.

Sorting by id ensures the subsequent range scan works efficiently.

Example script:

select /*!40001 SQL_NO_CACHE */ id from tb_user_info FORCE INDEX(`PRIMARY`) where id> "1" ORDER BY id limit 1000,1;
update tb_user_info set user_img=replace(user_img,'http','https') where id > "{1}" and id < "{2}";

The first query fetches a batch of primary‑key values without polluting the buffer pool; the second updates only that range, dramatically reducing binlog volume.

Rate Control and Parallelism

By exposing the batch process through an API, the team can monitor replication lag, IOPS, and memory usage, and throttle the update speed. Although the basic implementation is single‑threaded, the API can spawn a configurable thread pool to run multiple batches in parallel, raising the overall throughput while still keeping the replication impact under control.

Other Considerations

If the primary key is generated by Snowflake or auto‑increment, rows are naturally ordered, making the range‑based approach straightforward. For UUID primary keys, the team pre‑processes the data before deployment and then runs a full‑table update after the new code goes live.

performance optimizationSQLBatch ProcessingMySQLbinlogFull Table Update
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.