Optimizing Pagination Queries for Billion‑Row MySQL Tables
This article analyzes the performance problems of LIMIT‑based pagination on massive MySQL tables and presents three progressively more efficient solutions—including a simple LIMIT approach, a tag‑record method using the last primary key, and a range‑limit method with cached min‑ID—along with best‑practice indexing recommendations to keep query latency in the tens of milliseconds even for tables containing billions of rows.
In a system that stores follower‑fan relationships in MySQL across 16 shards and 256 tables, each table can hold over 200 million rows, making traditional LIMIT pagination increasingly slow as the offset grows.
Using LIMIT offset, n forces MySQL to scan offset + n rows and discard the first offset rows, so queries such as limit 100000,10 scan more than 100 k rows while limit 0,10 scans only ten rows.
Solution 1 – Simple LIMIT : Works for small data sets but performance degrades sharply for deep pages; it is easy to implement and supports jump‑page queries.
Solution 2 – Tag‑Record Method : Record the primary‑key ID of the last row returned (maxId) and use it as the upper bound for the next query, eliminating large offsets. Example SQL: select id, biz_content, pin FROM follow_fans_1 where biz_content = #{bizContent} and id < #{lastId} order by id desc limit 10; This method keeps query time in the tens of milliseconds for tens of millions of rows but cannot jump to arbitrary pages and may still time‑out on the final partially‑filled page.
Solution 3 – Range‑Limit Method : Add a lower bound (minId) to the query so the engine knows when to stop scanning. Example SQL: select id, biz_content, pin FROM follow_fans_1 where biz_content = #{bizContent} and id < #{lastId} and id >= #{minId} order by id desc limit 10; The minId is obtained with: select min(id) from follow_fans_1 where biz_content = #{bizContent}; Because the min‑ID query can be expensive on billion‑row tables, the article suggests caching the result (populated asynchronously) or computing it offline on a big‑data platform and refreshing the cache periodically.
Additional Index Optimizations : Ensure query predicates are covered by indexes, prefer clustered primary‑key indexes, follow the left‑most prefix rule, avoid functions or type conversions on indexed columns, and use covering (covering) indexes to eliminate back‑table lookups.
By combining these techniques, pagination over tables with billions of rows can be performed with stable, low‑latency performance, making the system scalable and responsive.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.