Why Large OFFSETs Slow MySQL Queries and How to Fix Them
A production MySQL query that uses huge OFFSET values caused massive slowdown, and the article explains the root cause, demonstrates data‑generation scripts, benchmarks the problem, and presents three practical solutions—including index‑covering subqueries, remembering the last primary‑key, and offset throttling—to dramatically improve pagination performance.
Background
On January 22, after work, a colleague called about a production issue: an API was being invoked millions of times with a large offset and limit, which slowed down the MySQL cluster.
POST domain/v1.0/module/method?order=condition&orderType=desc&offset=1800000&limit=500
The request paginated to page 3601 (offset 1,800,000, limit 500) and appeared over 8,000 times, far beyond the normal page size of 25 rows, indicating data scraping on a table with more than 100 million rows.
Analysis
The query itself is well‑optimized with proper joins, indexes on filter and sort columns, but using a large OFFSET forces MySQL to scan and discard a huge number of rows before returning the requested page, which becomes increasingly slow for later pages.
Example of fast query: limit 200,25 Example of slow query: limit 2000000,25 This forces MySQL to read 2,000,025 rows and discard the first 2,000,000, which is highly inefficient.
High‑offset pagination is a known problem described in "High Performance MySQL" (Chapter 6).
Data Simulation
To reproduce the issue, the article creates two tables (employee and department), two helper functions (random string and random number), and stored procedures to insert 5 million employee rows and 120 department rows.
/* Department table */
DROP TABLE IF EXISTS dep;
CREATE TABLE dep(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
depname VARCHAR(20) NOT NULL DEFAULT "",
memo VARCHAR(200) NOT NULL DEFAULT ""
);
/* Employee table */
DROP TABLE IF EXISTS emp;
CREATE TABLE emp(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
empno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
empname VARCHAR(20) NOT NULL DEFAULT "",
job VARCHAR(9) NOT NULL DEFAULT "",
mgr MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
hiredate DATETIME NOT NULL,
sal DECIMAL(7,2) NOT NULL,
comn DECIMAL(7,2) NOT NULL,
depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0
); /* Random string function */
DELIMITER $
DROP FUNCTION IF EXISTS rand_string;
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
DECLARE return_str VARCHAR(255) DEFAULT '';
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str = CONCAT(return_str, SUBSTRING(chars_str, FLOOR(1+RAND()*52), 1));
SET i = i + 1;
END WHILE;
RETURN return_str;
END $
DELIMITER ;
/* Random number function */
DELIMITER $
DROP FUNCTION IF EXISTS rand_num;
CREATE FUNCTION rand_num() RETURNS INT(5)
BEGIN
RETURN FLOOR(100 + RAND()*10);
END $
DELIMITER ; /* Insert 5,000,000 employees */
DELIMITER $
DROP PROCEDURE IF EXISTS insert_emp;
CREATE PROCEDURE insert_emp(IN START INT, IN max_num INT)
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO emp(empno, empname, job, mgr, hiredate, sal, comn, depno)
VALUES ((START+i), rand_string(6), 'SALEMAN', 0001, NOW(), 2000, 400, rand_num());
UNTIL i = max_num END REPEAT;
COMMIT;
END $
DELIMITER ;
CALL insert_emp(0, 5000000); /* Insert 120 departments */
DELIMITER $
DROP PROCEDURE IF EXISTS insert_dept;
CREATE PROCEDURE insert_dept(IN START INT, IN max_num INT)
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO dep(depno, depname, memo)
VALUES ((START+i), rand_string(10), rand_string(8));
UNTIL i = max_num END REPEAT;
COMMIT;
END $
DELIMITER ;
CALL insert_dept(1, 120);After data generation, indexes are added:
CREATE INDEX idx_emp_id ON emp(id);
CREATE INDEX idx_emp_depno ON emp(depno);
CREATE INDEX idx_dep_depno ON dep(depno);Testing
Two queries illustrate the performance gap:
/* offset 100, limit 25 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC LIMIT 100,25;
/* offset 4,800,000, limit 25 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC LIMIT 4800000,25;Execution times: ~0.001 s for the small offset, ~12.3 s for the large offset.
Solutions
1. Index‑covering subquery
First locate the starting id via a subquery, then fetch the page using that id:
/* offset 100 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 100,1)
ORDER BY a.id LIMIT 25;
/* offset 4,800,000 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 4800000,1)
ORDER BY a.id LIMIT 25;Times improve to 0.106 s and 1.54 s respectively.
2. Remember last primary‑key
Store the last id of the previous page and use it as the lower bound for the next page, eliminating the need for large offsets:
/* after page ending at id 100 */
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 100
ORDER BY a.id LIMIT 25;
/* after page ending at id 4,800,000 */
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 4800000
ORDER BY a.id LIMIT 25;This yields sub‑millisecond response times but only works for sequential scrolling, not for arbitrary page jumps.
3. Degrade when offset is too large
Set a maximum allowed offset; if the client requests a larger offset, return an empty result or a 4xx error, forcing the user to narrow the query criteria.
Conclusion
Applying the index‑covering subquery together with a reasonable offset limit dramatically reduces pagination latency. For infinite‑scroll scenarios, remembering the last primary‑key provides the best performance, while a hard offset ceiling protects the system from abusive data‑scraping.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
