Performance Testing and Optimization of MySQL Pagination for Large Datasets
This article demonstrates how to generate, insert, and query ten‑million‑row MySQL tables, measures the latency of ordinary LIMIT pagination, analyzes the impact of offset size and result set size, and presents practical optimization techniques such as sub‑query pagination and ID‑range filtering to dramatically improve query speed.
The author, a senior architect, shares a hands‑on performance test of MySQL 5.7.26 focusing on pagination queries over massive tables (up to ten million rows).
Test data preparation
CREATE TABLE `user_operation_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
`op_data` varchar(255) DEFAULT NULL,
`attr1` varchar(255) DEFAULT NULL,
`attr2` varchar(255) DEFAULT NULL,
`attr3` varchar(255) DEFAULT NULL,
`attr4` varchar(255) DEFAULT NULL,
`attr5` varchar(255) DEFAULT NULL,
`attr6` varchar(255) DEFAULT NULL,
`attr7` varchar(255) DEFAULT NULL,
`attr8` varchar(255) DEFAULT NULL,
`attr9` varchar(255) DEFAULT NULL,
`attr10` varchar(255) DEFAULT NULL,
`attr11` varchar(255) DEFAULT NULL,
`attr12` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;A stored procedure inserts ten million rows in batches of 1,000, committing after each batch to keep the insert time reasonable (about 38 minutes on a low‑end Windows 10 i5 machine).
Basic pagination test
SELECT * FROM `user_operation_log` LIMIT 10000, 10;Three runs of the above query took roughly 59 ms, 49 ms, and 50 ms, showing that simple LIMIT pagination is fast when the offset is modest and the result set is tiny.
Impact of result‑set size
SELECT * FROM `user_operation_log` LIMIT 10000, 10;
SELECT * FROM `user_operation_log` LIMIT 10000, 100;
SELECT * FROM `user_operation_log` LIMIT 10000, 1000;
SELECT * FROM `user_operation_log` LIMIT 10000, 10000;
SELECT * FROM `user_operation_log` LIMIT 10000, 100000;
SELECT * FROM `user_operation_log` LIMIT 10000, 1000000;The execution time grew from a few dozen milliseconds to several seconds as the number of rows returned increased, confirming that larger result sets cost more time.
Impact of offset size
SELECT * FROM `user_operation_log` LIMIT 100, 100;
SELECT * FROM `user_operation_log` LIMIT 1000, 100;
SELECT * FROM `user_operation_log` LIMIT 10000, 100;
SELECT * FROM `user_operation_log` LIMIT 100000, 100;
SELECT * FROM `user_operation_log` LIMIT 1000000, 100;When the offset grew, query latency increased noticeably, demonstrating the classic “large‑offset penalty” of MySQL LIMIT.
Optimization for large offsets
Use a sub‑query to locate the starting id and then fetch the next rows:
SELECT * FROM `user_operation_log` WHERE id >= (
SELECT id FROM `user_operation_log` LIMIT 1000000, 1
) LIMIT 10;This approach leverages the primary‑key index and dramatically reduces the cost of deep pagination.
When id is not strictly sequential, an IN sub‑query can be used (note that some MySQL versions do not allow LIMIT inside IN ).
SELECT * FROM `user_operation_log`
WHERE id IN (
SELECT t.id FROM (
SELECT id FROM `user_operation_log` LIMIT 1000000, 10
) AS t
);Another method is to filter by an id range:
SELECT * FROM `user_operation_log` WHERE id BETWEEN 1000000 AND 1000100 LIMIT 100;This works only when id values are continuous and monotonic.
Optimization for large result sets
Retrieve only the needed columns instead of * to reduce parsing, network transfer, and memory usage:
SELECT id, user_id, ip, op_data FROM `user_operation_log` LIMIT 1, 1000000;Tests showed that column‑pruned queries run as fast as the full‑column version, while avoiding unnecessary data transfer.
Why avoid SELECT *
The server must resolve metadata for every column, adding CPU overhead.
Large text columns (e.g., logs) increase network payload dramatically, especially when the DB and application are on different machines.
Conclusion
For pagination over massive tables, prefer index‑based navigation (sub‑query or id range) instead of large offsets, and always select only the columns you really need. These techniques keep query latency low even when the table contains tens of millions of rows.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.