Databases 8 min read

How to Safely Perform Billion-Row MySQL Updates Without Overloading Replication

This article explains why a direct full‑table UPDATE on a massive MySQL table can overwhelm binlog replication, explores the pitfalls of deep pagination and IN‑clauses, and presents a batch‑processing strategy using NO_CACHE and FORCE INDEX to achieve efficient, low‑impact updates.

macrozheng
macrozheng
macrozheng
How to Safely Perform Billion-Row MySQL Updates Without Overloading Replication

When a business needs to run a full‑table UPDATE on a large MySQL table (hundreds of millions of rows), the generated binlog can flood a row‑format replication setup, causing severe performance and stability issues.

Binlog Formats

statement : records each executed SQL, low log volume but may fail with nondeterministic functions.

row : records the before/after image of every changed row, leading to huge logs for bulk deletes or updates.

mixed : combines both, using statement for simple statements and row for others.

Our production MySQL uses row‑format binlog, so a full‑table UPDATE on a billion‑row table would create massive binlogs and block the replica.

Direct UPDATE Problem

Attempting a simple UPDATE such as

<code>update tb_user_info set user_img=replace(user_img,'http://','https://');</code>

quickly overwhelms the master and replica.

Deep Pagination Issue

Using

LIMIT

with an increasing offset causes MySQL to scan deep into the B‑tree, essentially performing a full table scan for large offsets, which is highly inefficient.

IN‑Clause Inefficiency

Fetching a list of IDs and updating with

WHERE id IN (...)

still results in poor performance despite some index predictions.

Final Batch‑Update Solution

The effective approach combines two queries: a SELECT that avoids the buffer pool and forces the primary index, followed by an UPDATE that operates on a sorted range of primary keys.

<code>select /*!40001 SQL_NO_CACHE */ id from tb_user_info FORCE INDEX(`PRIMARY`) where id>"1" ORDER BY id limit 1000,1;</code>
<code>update tb_user_info set user_img=replace(user_img,'http','https') where id > "{1}" and id < "{2}";</code>

The

/*!40001 SQL_NO_CACHE */

hint prevents the query from polluting the InnoDB buffer pool, keeping the cold data out of the hot cache.

FORCE INDEX(`PRIMARY`)

ensures the primary key index is used, and ordering by

id

allows efficient range scans.

By controlling the batch size and execution rate through an API (e.g., limiting threads in a thread pool), you can monitor and adjust IOPS, memory usage, and replication lag, achieving safe, incremental updates.

Other Considerations

If primary keys are generated with Snowflake or auto‑increment, the natural order aids this method; for UUIDs, you must pre‑process data before the bulk update.

performanceMySQLbinlogDatabase OptimizationBatch Update
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.