Optimizing MySQL Pagination for Large Datasets: Analysis and Solutions
An engineer describes a production MySQL pagination issue where large offset queries caused severe slowdown, analyzes the root cause, demonstrates data simulation scripts, and presents three optimization strategies—including index covering with subqueries, redefining start positions, and applying offset limits—to improve query performance on massive tables.
Background
The author was alerted to a production incident where a REST API endpoint was being called with extremely large offset and limit parameters (e.g., offset=1800000&limit=500), causing the MySQL cluster to slow down dramatically.
Analysis
The slowdown was traced to MySQL’s handling of large offsets: a query such as
SELECT * FROM t_name WHERE c_name1='xxx' ORDER BY c_name2 LIMIT 2000000,25forces MySQL to scan and discard millions of rows before returning the requested page, leading to high latency on later pages.
Data Simulation
To reproduce the problem, the author created two tables ( dep and emp) and populated them with millions of rows using stored procedures and helper functions:
/* Department table */
DROP TABLE IF EXISTS dep;
CREATE TABLE dep(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
depname VARCHAR(20) NOT NULL DEFAULT "",
memo VARCHAR(200) NOT NULL DEFAULT ""
);
/* Employee table */
DROP TABLE IF EXISTS emp;
CREATE TABLE emp(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
empno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
empname VARCHAR(20) NOT NULL DEFAULT "",
job VARCHAR(9) NOT NULL DEFAULT "",
mgr MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
hiredate DATETIME NOT NULL,
sal DECIMAL(7,2) NOT NULL,
comn DECIMAL(7,2) NOT NULL,
depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0
);
/* Random string function */
DELIMITER $
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
DECLARE return_str VARCHAR(255) DEFAULT '';
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str = CONCAT(return_str, SUBSTRING(chars_str, FLOOR(1+RAND()*52), 1));
SET i = i + 1;
END WHILE;
RETURN return_str;
END $
DELIMITER ;
/* Insert 5,000,000 employee rows */
CALL insert_emp(0, 5000000);
/* Insert 120 department rows */
CALL insert_dept(1, 120);Testing
Two representative queries were executed:
/* Small offset */
SELECT a.empno, a.empname, a.job, a.sal, b.depno, b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 100, 25;
/* Large offset */
SELECT a.empno, a.empname, a.job, a.sal, b.depno, b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 4800000, 25;The small‑offset query returned in 0.001 s, while the large‑offset query took 12.275 s, confirming the performance issue.
Solutions
Index covering with subquery : Use the primary key index to locate the start row and then fetch the page, e.g.
SELECT a.empno, a.empname, a.job, a.sal, b.depno, b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 100,1)
ORDER BY a.id
LIMIT 25;Redefine start position : Remember the last primary‑key value of the previous page and query with WHERE id > last_id instead of using OFFSET.
SELECT ... FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 100
ORDER BY a.id
LIMIT 25;Degrade strategy : Impose a maximum offset threshold; if the requested offset exceeds it, return an empty result or a 4xx error to prevent abusive scans.
Conclusion
Applying the first two techniques dramatically reduced query time (e.g., the large‑offset query dropped from 12 s to 1.5 s, and the start‑position method completed in 0.001 s). The author also recommends rate‑limiting and monitoring for high‑frequency calls to protect the database.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
