Databases 13 min read

Optimizing MySQL Pagination for Large Datasets: Analysis and Solutions

An engineer describes a production MySQL pagination issue where large offset queries caused severe slowdown, analyzes the root cause, demonstrates data simulation scripts, and presents three optimization strategies—including index covering with subqueries, redefining start positions, and applying offset limits—to improve query performance on massive tables.

Architect

Jun 6, 2021

Optimizing MySQL Pagination for Large Datasets: Analysis and Solutions

Background

The author was alerted to a production incident where a REST API endpoint was being called with extremely large offset and limit parameters (e.g., offset=1800000&limit=500), causing the MySQL cluster to slow down dramatically.

Analysis

The slowdown was traced to MySQL’s handling of large offsets: a query such as

SELECT * FROM t_name WHERE c_name1='xxx' ORDER BY c_name2 LIMIT 2000000,25

forces MySQL to scan and discard millions of rows before returning the requested page, leading to high latency on later pages.

Data Simulation

To reproduce the problem, the author created two tables ( dep and emp) and populated them with millions of rows using stored procedures and helper functions:

/* Department table */
DROP TABLE IF EXISTS dep;
CREATE TABLE dep(
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    depname VARCHAR(20) NOT NULL DEFAULT "",
    memo VARCHAR(200) NOT NULL DEFAULT ""
);

/* Employee table */
DROP TABLE IF EXISTS emp;
CREATE TABLE emp(
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    empno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    empname VARCHAR(20) NOT NULL DEFAULT "",
    job VARCHAR(9) NOT NULL DEFAULT "",
    mgr MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    hiredate DATETIME NOT NULL,
    sal DECIMAL(7,2) NOT NULL,
    comn DECIMAL(7,2) NOT NULL,
    depno MEDIUMINT UNSIGNED NOT NULL DEFAULT 0
);

/* Random string function */
DELIMITER $
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
    DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    DECLARE return_str VARCHAR(255) DEFAULT '';
    DECLARE i INT DEFAULT 0;
    WHILE i < n DO
        SET return_str = CONCAT(return_str, SUBSTRING(chars_str, FLOOR(1+RAND()*52), 1));
        SET i = i + 1;
    END WHILE;
    RETURN return_str;
END $
DELIMITER ;

/* Insert 5,000,000 employee rows */
CALL insert_emp(0, 5000000);

/* Insert 120 department rows */
CALL insert_dept(1, 120);

Testing

Two representative queries were executed:

/* Small offset */
SELECT a.empno, a.empname, a.job, a.sal, b.depno, b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 100, 25;

/* Large offset */
SELECT a.empno, a.empname, a.job, a.sal, b.depno, b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
ORDER BY a.id DESC
LIMIT 4800000, 25;

The small‑offset query returned in 0.001 s, while the large‑offset query took 12.275 s, confirming the performance issue.

Solutions

Index covering with subquery : Use the primary key index to locate the start row and then fetch the page, e.g.

SELECT a.empno, a.empname, a.job, a.sal, b.depno, b.depname
FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id >= (SELECT id FROM emp ORDER BY id LIMIT 100,1)
ORDER BY a.id
LIMIT 25;

Redefine start position : Remember the last primary‑key value of the previous page and query with WHERE id > last_id instead of using OFFSET.

SELECT ... FROM emp a LEFT JOIN dep b ON a.depno = b.depno
WHERE a.id > 100
ORDER BY a.id
LIMIT 25;

Degrade strategy : Impose a maximum offset threshold; if the requested offset exceeds it, return an empty result or a 4xx error to prevent abusive scans.

Conclusion

Applying the first two techniques dramatically reduced query time (e.g., the large‑offset query dropped from 12 s to 1.5 s, and the start‑position method completed in 0.001 s). The author also recommends rate‑limiting and monitoring for high‑frequency calls to protect the database.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization SQL database mysql pagination

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.