Databases 16 min read

Why Large OFFSETs Slow MySQL Queries and How to Fix Them

A developer discovers that massive OFFSET values in MySQL pagination cause full‑table scans and severe latency, then reproduces the issue with simulated data, benchmarks the slowdown, and presents three practical solutions—including index‑covering subqueries, cursor‑based pagination, and offset limits—to restore query performance.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Why Large OFFSETs Slow MySQL Queries and How to Fix Them

Background

On a late‑night subway ride the author receives a call about a production incident: a query endpoint is being called millions of times with a huge offset and limit, dragging the MySQL cluster down. The request pattern looks like

POST domain/v1.0/module/method?order=condition&orderType=desc&offset=1800000&limit=500

, meaning the caller is paging to the 3,601‑st page (500 rows per page) and has done this over 8,000 times.

Because the UI normally shows 25 rows per page, this traffic is clearly an automated data‑scraping operation, likely from a QA‑owned client called ApiAutotest. The author decides to investigate the root cause.

Analysis

The query itself is well‑indexed, but using a large OFFSET forces MySQL to scan and discard millions of rows before returning the requested page. For example,

SELECT * FROM t_name WHERE c_name1='xxx' ORDER BY c_name2 LIMIT 2000000,25

makes MySQL read 2,000,025 rows and then drop the first 2,000,000, which is extremely inefficient.

The High Performance MySQL book explains that a huge offset leads to massive row scans, which is the core of the problem.

Data Simulation

To reproduce the issue, the author creates two tables ( dep and emp) and populates them with millions of rows.

/* Department table */
DROP TABLE IF EXISTS dep;
CREATE TABLE dep(
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    depname VARCHAR(20) NOT NULL DEFAULT "",
    memo VARCHAR(200) NOT NULL DEFAULT ""
);

/* Employee table */
DROP TABLE IF EXISTS emp;
CREATE TABLE emp(
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    empno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    empname VARCHAR(20) NOT NULL DEFAULT "",
    job VARCHAR(9) NOT NULL DEFAULT "",
    mgr MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    hiredate DATETIME NOT NULL,
    sal DECIMAL(7,2) NOT NULL,
    comn DECIMAL(7,2) NOT NULL,
    depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0
);

Two helper functions generate random strings and random department numbers:

/* Random string */
DELIMITER $
DROP FUNCTION IF EXISTS rand_string;
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
    DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    DECLARE return_str VARCHAR(255) DEFAULT '';
    DECLARE i INT DEFAULT 0;
    WHILE i < n DO
        SET return_str = CONCAT(return_str, SUBSTRING(chars_str, FLOOR(1+RAND()*52), 1));
        SET i = i + 1;
    END WHILE;
    RETURN return_str;
END $
DELIMITER ;

/* Random department number */
DELIMITER $
DROP FUNCTION IF EXISTS rand_num;
CREATE FUNCTION rand_num() RETURNS INT(5)
BEGIN
    RETURN FLOOR(100 + RAND()*10);
END $
DELIMITER ;

Stored procedures insert 5,000,000 employee rows and 120 department rows:

/* Insert employees */
DELIMITER $
DROP PROCEDURE IF EXISTS insert_emp;
CREATE PROCEDURE insert_emp(IN START INT, IN max_num INT)
BEGIN
    DECLARE i INT DEFAULT 0;
    SET autocommit = 0;
    REPEAT
        SET i = i + 1;
        INSERT INTO emp(empno, empname, job, mgr, hiredate, sal, comn, depno)
        VALUES ((START+i), rand_string(6), 'SALEMAN', 1, NOW(), 2000, 400, rand_num());
    UNTIL i = max_num END REPEAT;
    COMMIT;
END $
DELIMITER ;
CALL insert_emp(0, 5000000);

/* Insert departments */
DELIMITER $
DROP PROCEDURE IF EXISTS insert_dept;
CREATE PROCEDURE insert_dept(IN START INT, IN max_num INT)
BEGIN
    DECLARE i INT DEFAULT 0;
    SET autocommit = 0;
    REPEAT
        SET i = i + 1;
        INSERT INTO dep(depno, depname, memo)
        VALUES ((START+i), rand_string(10), rand_string(8));
    UNTIL i = max_num END REPEAT;
    COMMIT;
END $
DELIMITER ;
CALL insert_dept(1, 120);

After data generation, indexes on the primary key and foreign key columns are created:

CREATE INDEX idx_emp_id ON emp(id);
CREATE INDEX idx_emp_depno ON emp(depno);
CREATE INDEX idx_dep_depno ON dep(depno);

Testing

Two queries illustrate the performance gap:

/* Offset 100, limit 25 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 100,25;

/* Offset 4,800,000, limit 25 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 4800000,25;

The first query returns instantly (<0.001 s); the second takes over 12 s because MySQL must scan 4.8 million rows.

Solutions

1. Index‑covering subquery

Use the primary‑key index to locate the start row, then fetch the next 25 rows:

/* Offset 100 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 100,1)
ORDER BY a.id LIMIT 25;

/* Offset 4,800,000 */
SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 4800000,1)
ORDER BY a.id LIMIT 25;

Execution time drops to 0.1 s and 1.5 s respectively.

2. Redefine the start position (keyset pagination)

Remember the last primary‑key value of the previous page and query with a simple WHERE id > last_id:

/* After page ending at id 100 */
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 100
ORDER BY a.id LIMIT 25;

/* After page ending at id 4,800,000 */
SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 4800000
ORDER BY a.id LIMIT 25;

This approach consistently returns results in <0.001 s because only the required 25 rows are scanned.

3. Degrade when offset is too large

Set a maximum allowed offset; if a request exceeds it, return an empty result or a 4xx error, forcing the client to narrow the query criteria.

Conclusion

Applying the subquery‑based or keyset pagination methods dramatically improves query latency for large offsets. In production the team limited the maximum offset and combined the two fast methods with additional rate‑limiting to prevent abusive scraping. The experience highlights the importance of considering extreme pagination scenarios when designing APIs and capacity‑testing database workloads.

Pagination performance illustration
Pagination performance illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceSQLmysqlpaginationIndex OptimizationKeyset PaginationLarge OFFSET
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.