Databases 7 min read

How to Safely Perform Full-Table Updates on Billion-Row MySQL Tables

Updating billions of rows in a MySQL table can overwhelm binlog replication and cause deep‑pagination inefficiencies, so this article explains the pitfalls of direct UPDATE, explores limit‑based and IN‑based approaches, and presents a production‑ready batch update strategy using NO_CACHE and forced primary‑key indexing.

dbaplus Community
dbaplus Community
dbaplus Community
How to Safely Perform Full-Table Updates on Billion-Row MySQL Tables

Why Direct UPDATE Fails on Large Tables

When a MySQL table reaches hundreds of millions or billions of rows, a simple UPDATE generates massive binlog entries. In a master‑slave setup using row format binlog, the master writes a log for every changed row, and the slave must replay all those statements, causing severe I/O pressure and potential replication lag.

The binlog formats are:

statement : records the original SQL; low log volume but can break with nondeterministic functions.

row : records each row change; high log volume for bulk operations.

mixed : combines both, using statement for simple statements and row for others.

Initial Attempt and Its Drawbacks

The first naive approach was to run a single statement:

update tb_user_info set user_img=replace(user_img,'http://','https://');

This would flood the binlog and block the replica.

Deep Pagination Problem

Trying to mitigate the load, a script used LIMIT with an offset:

update tb_user_info set user_img=replace(user_img,'http://','https://') limit 1,1000;

However, MySQL implements LIMIT by scanning the B‑tree to the leaf node and then moving forward, which becomes a near full‑table scan for large offsets, known as the “deep pagination” issue.

IN‑Clause Inefficiency

Another idea was to collect a batch of IDs and update with an IN list:

select * from tb_user_info where id > {index} limit 100;</code><code>update tb_user_info set user_img=replace(user_img,'http','https') where id in (id1,id2,id3);

Even with index hints, MySQL still performs poorly on large IN sets.

Final Batch Update Strategy

After discussions with DBAs, the team settled on a two‑step process:

Use a SELECT that disables the buffer pool cache and forces the primary‑key index, retrieving a sorted list of IDs in manageable chunks.

Update the rows by range on the primary key, avoiding IN and large LIMIT offsets.

The query pattern looks like this:

select /*!40001 SQL_NO_CACHE */ id
from tb_user_info FORCE INDEX(`PRIMARY`)
where id > "1"
order by id
limit 1000,1;</code><code>update tb_user_info set user_img=replace(user_img,'http','https')
where id > "{1}" and id < "{2}";

The SQL_NO_CACHE hint prevents the data pages from entering the InnoDB buffer pool, keeping hot data unaffected. FORCE INDEX(PRIMARY) guarantees the primary‑key index is used, and ordering by id ensures the subsequent range update can efficiently locate rows.

Operational Benefits

This method allows the update process to be throttled via an API, monitoring replication lag, IOPS, and memory usage. Although the basic implementation is single‑threaded, the API can be extended to use a thread pool, enabling parallel batches while still controlling the overall write rate.

Additional Considerations

If primary keys are generated with Snowflake or auto‑increment, the sequential order aids this approach. For UUID primary keys, the data must be pre‑processed before insertion, or a separate migration step is required after deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceSQLBatch ProcessingmysqlBinlogFull Table Update
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.