Why Large OFFSETs Slow MySQL Queries and How to Fix Them
A developer discovers that massive OFFSET values in MySQL pagination cause full‑table scans and severe latency, then reproduces the issue with simulated data, benchmarks the slowdown, and presents three practical solutions—including index‑covering subqueries, cursor‑based pagination, and offset limits—to restore query performance.
Background
On a late‑night subway ride the author receives a call about a production incident: a query endpoint is being called millions of times with a huge offset and limit, dragging the MySQL cluster down. The request pattern looks like
POST domain/v1.0/module/method?order=condition&orderType=desc&offset=1800000&limit=500, meaning the caller is paging to the 3,601‑st page (500 rows per page) and has done this over 8,000 times.
Because the UI normally shows 25 rows per page, this traffic is clearly an automated data‑scraping operation, likely from a QA‑owned client called ApiAutotest. The author decides to investigate the root cause.
Analysis
The query itself is well‑indexed, but using a large OFFSET forces MySQL to scan and discard millions of rows before returning the requested page. For example,
SELECT * FROM t_name WHERE c_name1='xxx' ORDER BY c_name2 LIMIT 2000000,25makes MySQL read 2,000,025 rows and then drop the first 2,000,000, which is extremely inefficient.
The High Performance MySQL book explains that a huge offset leads to massive row scans, which is the core of the problem.
Data Simulation
To reproduce the issue, the author creates two tables ( dep and emp) and populates them with millions of rows.
/* Department table */
DROP TABLE IF EXISTS dep;
CREATE TABLE dep(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
depname VARCHAR(20) NOT NULL DEFAULT "",
memo VARCHAR(200) NOT NULL DEFAULT ""
);
/* Employee table */
DROP TABLE IF EXISTS emp;
CREATE TABLE emp(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
empno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
empname VARCHAR(20) NOT NULL DEFAULT "",
job VARCHAR(9) NOT NULL DEFAULT "",
mgr MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
hiredate DATETIME NOT NULL,
sal DECIMAL(7,2) NOT NULL,
comn DECIMAL(7,2) NOT NULL,
depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0
);Two helper functions generate random strings and random department numbers:
/* Random string */
DELIMITER $
DROP FUNCTION IF EXISTS rand_string;
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
DECLARE return_str VARCHAR(255) DEFAULT '';
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str = CONCAT(return_str, SUBSTRING(chars_str, FLOOR(1+RAND()*52), 1));
SET i = i + 1;
END WHILE;
RETURN return_str;
END $
DELIMITER ;
/* Random department number */
DELIMITER $
DROP FUNCTION IF EXISTS rand_num;
CREATE FUNCTION rand_num() RETURNS INT(5)
BEGIN
RETURN FLOOR(100 + RAND()*10);
END $
DELIMITER ;Stored procedures insert 5,000,000 employee rows and 120 department rows:
/* Insert employees */
DELIMITER $
DROP PROCEDURE IF EXISTS insert_emp;
CREATE PROCEDURE insert_emp(IN START INT, IN max_num INT)
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO emp(empno, empname, job, mgr, hiredate, sal, comn, depno)
VALUES ((START+i), rand_string(6), 'SALEMAN', 1, NOW(), 2000, 400, rand_num());
UNTIL i = max_num END REPEAT;
COMMIT;
END $
DELIMITER ;
CALL insert_emp(0, 5000000);
/* Insert departments */
DELIMITER $
DROP PROCEDURE IF EXISTS insert_dept;
CREATE PROCEDURE insert_dept(IN START INT, IN max_num INT)
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO dep(depno, depname, memo)
VALUES ((START+i), rand_string(10), rand_string(8));
UNTIL i = max_num END REPEAT;
COMMIT;
END $
DELIMITER ;
CALL insert_dept(1, 120);After data generation, indexes on the primary key and foreign key columns are created:
CREATE INDEX idx_emp_id ON emp(id);
CREATE INDEX idx_emp_depno ON emp(depno);
CREATE INDEX idx_dep_depno ON dep(depno);Testing
Two queries illustrate the performance gap:
/* Offset 100, limit 25 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 100,25;
/* Offset 4,800,000, limit 25 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 4800000,25;The first query returns instantly (<0.001 s); the second takes over 12 s because MySQL must scan 4.8 million rows.
Solutions
1. Index‑covering subquery
Use the primary‑key index to locate the start row, then fetch the next 25 rows:
/* Offset 100 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 100,1)
ORDER BY a.id LIMIT 25;
/* Offset 4,800,000 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 4800000,1)
ORDER BY a.id LIMIT 25;Execution time drops to 0.1 s and 1.5 s respectively.
2. Redefine the start position (keyset pagination)
Remember the last primary‑key value of the previous page and query with a simple WHERE id > last_id:
/* After page ending at id 100 */
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 100
ORDER BY a.id LIMIT 25;
/* After page ending at id 4,800,000 */
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 4800000
ORDER BY a.id LIMIT 25;This approach consistently returns results in <0.001 s because only the required 25 rows are scanned.
3. Degrade when offset is too large
Set a maximum allowed offset; if a request exceeds it, return an empty result or a 4xx error, forcing the client to narrow the query criteria.
Conclusion
Applying the subquery‑based or keyset pagination methods dramatically improves query latency for large offsets. In production the team limited the maximum offset and combined the two fast methods with additional rate‑limiting to prevent abusive scraping. The experience highlights the importance of considering extreme pagination scenarios when designing APIs and capacity‑testing database workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
