How to Efficiently Paginate 10 Million MySQL Records: Real‑World Tests & Optimizations
This article walks through creating a 10‑million‑row MySQL table, measuring pagination query times under different offsets and result sizes, and presents practical optimizations such as sub‑queries, ID‑range filtering, and column selection to dramatically improve performance while explaining why using SELECT * is discouraged.
Introduction
An interview question asks how to query ten million rows; the author demonstrates a hands‑on test using MySQL 5.7.26.
Data Preparation
A table user_operation_log is created and a stored procedure inserts ten million rows in batches, committing every 1,000 rows for speed.
CREATE TABLE `user_operation_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
...
PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 CHARACTER SET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=Dynamic; DELIMITER ;;
CREATE PROCEDURE batch_insert_log()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE userId INT DEFAULT 10000000;
SET @execSql = 'INSERT INTO `test`.`user_operation_log`(`user_id`,`ip`,`op_data`,`attr1`,`attr2`,`attr3`,`attr4`,`attr5`,`attr6`,`attr7`,`attr8`,`attr9`,`attr10`,`attr11`,`attr12`) VALUES';
SET @execData = '';
WHILE i<=10000000 DO
SET @attr = '"测试很长很长的属性"';
SET @execData = CONCAT(@execData, "(", userId+i, ",'10.0.69.175','用户登录操作'", ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")");
IF i % 1000 = 0 THEN
SET @stmtSql = CONCAT(@execSql, @execData, ';');
PREPARE stmt FROM @stmtSql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
COMMIT;
SET @execData = '';
ELSE
SET @execData = CONCAT(@execData, ',');
END IF;
SET i = i + 1;
END WHILE;
END;;
DELIMITER ;Testing
The test machine is a low‑end Windows 10 i5 with a ~500 MB/s SSD. About 3.148 million rows (5 GB) are inserted, taking 38 minutes. SELECT count(1) FROM `user_operation_log`; Result: 3,148,000 rows. Three consecutive count queries take 14,060 ms, 13,755 ms, and 13,447 ms.
Simple Pagination
MySQL uses LIMIT to fetch a specific range; Oracle uses ROWNUM.
SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset;Test query:
SELECT * FROM `user_operation_log` LIMIT 10000, 10;Three runs take 59 ms, 49 ms, and 50 ms respectively.
Same offset, different result size
SELECT * FROM `user_operation_log` LIMIT 10000, 10;
SELECT * FROM `user_operation_log` LIMIT 10000, 100;
SELECT * FROM `user_operation_log` LIMIT 10000, 1000;
SELECT * FROM `user_operation_log` LIMIT 10000, 10000;
SELECT * FROM `user_operation_log` LIMIT 10000, 100000;
SELECT * FROM `user_operation_log` LIMIT 10000, 1000000;Execution time grows with the number of rows returned, as shown in the chart.
Same result size, different offset
SELECT * FROM `user_operation_log` LIMIT 100, 100;
SELECT * FROM `user_operation_log` LIMIT 1000, 100;
SELECT * FROM `user_operation_log` LIMIT 10000, 100;
SELECT * FROM `user_operation_log` LIMIT 100000, 100;
SELECT * FROM `user_operation_log` LIMIT 1000000, 100;Execution time increases with larger offsets, as illustrated in the second chart.
Optimization
Large Offset Problem
Sub‑query Method
SELECT * FROM `user_operation_log` LIMIT 1000000, 10;
SELECT id FROM `user_operation_log` LIMIT 1000000, 1;
SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10;The sub‑query uses the primary‑key index and is noticeably faster, though it only works when id is monotonically increasing.
ID Range Method
SELECT * FROM `user_operation_log` WHERE id BETWEEN 1000000 AND 1000100 LIMIT 100;
SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100;This approach is extremely fast because it avoids the offset entirely.
Large Data‑Volume Problem
SELECT * FROM `user_operation_log` LIMIT 1, 1000000;
SELECT id FROM `user_operation_log` LIMIT 1, 1000000;
SELECT id, user_id, ip, op_data, attr1, attr2, attr3, attr4, attr5, attr6, attr7, attr8, attr9, attr10, attr11, attr12 FROM `user_operation_log` LIMIT 1, 1000000;Fetching only the needed columns dramatically reduces query time, as shown in the chart.
Why Not SELECT * ?
Using SELECT * forces the database to parse all column metadata and can pull unnecessary large text fields, increasing both CPU load and network traffic, especially when the client is remote.
Conclusion
Readers are encouraged to reproduce the experiments and explore further optimizations on their own.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
