Why LIMIT with Large OFFSET Slows MySQL and How to Speed It Up
When a MySQL query uses LIMIT with a large offset on a table of millions of rows, the database must scan hundreds of thousands of index entries and corresponding clustered rows, causing massive random I/O; rewriting the query with a sub‑query join reduces I/O dramatically, cutting execution time from over a minute to under a second while also preventing buffer‑pool pollution.
Background
The article examines a MySQL 5.7.17 table named test that holds about 5.2 million rows (id: auto‑increment primary key, val: non‑unique indexed column, source: int). The goal is to fetch the last five rows where val = 4 using a LIMIT clause with a large offset.
Problem with Large OFFSET
Original SQL:
SELECT * FROM test WHERE val = 4 LIMIT 300000, 5;Running this query takes roughly 26 seconds (execution 15.98 s, fetching 10 ms) because MySQL must read 300,005 index leaf nodes, then fetch the same number of clustered rows, and finally discard the first 300,000 rows. This results in a huge amount of random I/O.
Optimized Approach
The query is rewritten to first retrieve only the primary keys of the desired rows and then join back to the table:
SELECT a.*
FROM test a
INNER JOIN (
SELECT id FROM test WHERE val = 4 LIMIT 300000, 5
) b ON a.id = b.id;After applying this change, execution time drops to about 0.35 s (execution 163 ms, fetching 184 ms). The improvement stems from scanning only the needed index entries and fetching just five clustered rows.
Why the Optimized Query Is Faster
MySQL’s InnoDB stores data in a clustered index keyed by the primary key. With a large offset, the engine still walks the secondary index ( val) row by row, then performs a random lookup for each matching primary key in the clustered index. The original query forces 300,005 such lookups, most of which are discarded.
In the optimized version, the sub‑query isolates the five primary keys first; the subsequent join retrieves only those five rows, reducing random I/O from hundreds of thousands to just five.
Verification Using Buffer Pool Statistics
To prove the I/O difference, the author examines InnoDB buffer‑pool pages before and after each query. After the original query, the buffer pool contains 4,098 data pages and 208 index pages. After the optimized query, only five data pages are loaded. The buffer‑pool statistics are obtained with:
SELECT index_name, COUNT(*)
FROM information_schema.INNODB_BUFFER_PAGE
WHERE INDEX_NAME IN ('val','PRIMARY')
AND TABLE_NAME LIKE '%test%'
GROUP BY index_name;These numbers confirm that the original query pollutes the buffer pool with many rarely‑used pages, while the optimized query touches only the necessary pages.
Additional Considerations
Frequent loading of irrelevant pages can degrade overall performance by evicting hot pages from the buffer pool. To ensure a clean buffer pool after each MySQL restart, the options innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup should be disabled.
Conclusion
Using LIMIT with a large offset on a non‑unique indexed column leads to massive random I/O and buffer‑pool pollution. Rewriting the query to first fetch primary keys via a sub‑query and then joining back dramatically reduces I/O, execution time, and buffer‑pool impact.
References
https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
https://dev.mysql.com/doc/refman/5.7/en/innodb-information-schema-buffer-pool-tables.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
