How to Efficiently Query Millions of Rows in MySQL: Pagination & Optimization
This article demonstrates how to generate, insert, and query a ten‑million‑row MySQL table, measures the performance of ordinary LIMIT pagination versus optimized techniques, and provides practical SQL snippets and tips for handling large offsets and massive data sets.
Preparation
To test pagination on a large dataset, a MySQL 5.7.26 database is used. Since inserting ten million rows directly would be extremely slow, a script is employed to generate the data efficiently.
Create Table
CREATE TABLE `user_operation_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr4` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr5` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr6` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr7` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr8` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr9` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr10` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr11` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr12` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 CHARACTER SET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=Dynamic;Create Data Script
DELIMITER ;;
CREATE PROCEDURE batch_insert_log()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE userId INT DEFAULT 10000000;
SET @execSql = 'INSERT INTO `test`.`user_operation_log`(`user_id`, `ip`, `op_data`, `attr1`, `attr2`, `attr3`, `attr4`, `attr5`, `attr6`, `attr7`, `attr8`, `attr9`, `attr10`, `attr11`, `attr12`) VALUES';
SET @execData = '';
WHILE i <= 10000000 DO
SET @attr = "'测试很长很长很长...的属性'";
SET @execData = CONCAT(@execData, "(", userId + i, ", '10.0.69.175', '用户登录操作'", ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")");
IF i % 1000 = 0 THEN
SET @stmtSql = CONCAT(@execSql, @execData, ';');
PREPARE stmt FROM @stmtSql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
COMMIT;
SET @execData = '';
ELSE
SET @execData = CONCAT(@execData, ",");
END IF;
SET i = i + 1;
END WHILE;
END;;
DELIMITER ;Test Execution
On a low‑end Windows 10 PC (≈500 MB SSD, i5 CPU), 3,148,000 rows were inserted, consuming ~5 GB disk. Counting rows took 38 minutes. The following query confirmed the row count: SELECT count(1) FROM `user_operation_log`; Result: 3,148,000 rows.
Ordinary Pagination Query
MySQL LIMIT syntax:
SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset;Example query:
SELECT * FROM `user_operation_log` LIMIT 10000, 10;Three runs yielded 59 ms, 49 ms, and 50 ms respectively, showing acceptable speed for small offsets on a local database.
Impact of Data Volume
Running the same offset (10,000) with increasing row counts (10, 100, 1 000, 10 000, 100 000, 1 000 000) demonstrated that larger result sets take longer. The chart below visualizes the trend:
Impact of Offset Size
Keeping the result size constant (100 rows) while increasing the offset (100, 1 000, 10 000, 100 000, 1 000 000) showed a clear increase in query time as the offset grew. The following chart illustrates this:
Optimization Strategies
Large Offset
Use a sub‑query to locate the starting id and then fetch rows based on that id, which leverages the primary‑key index:
SELECT * FROM `user_operation_log` WHERE id >= (
SELECT id FROM `user_operation_log` LIMIT 1000000, 1
) LIMIT 10;This approach reduces the cost of scanning large offsets. It works best when id is monotonically increasing.
For non‑monotonic ids, a nested IN query can be used (note: some MySQL versions may not support LIMIT inside IN).
SELECT * FROM `user_operation_log` WHERE id IN (
SELECT t.id FROM (
SELECT id FROM `user_operation_log` LIMIT 1000000, 10
) AS t
);Large Result Set
Limit the columns returned; selecting only needed fields dramatically improves performance compared to SELECT *. Example:
SELECT id, user_id, ip FROM `user_operation_log` LIMIT 1, 1000000;Benchmark results show that fetching fewer columns reduces both parsing overhead and network transfer size.
Why Avoid SELECT *
Database must resolve metadata for every column, increasing parsing cost.
Unnecessary columns increase network payload, especially when the client is remote.
Conclusion
When dealing with tens of millions of rows, avoid large offsets and SELECT *. Prefer index‑based range queries or sub‑queries, and always select only the columns you need. These techniques keep query latency low even as data volume grows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
