How to Efficiently Paginate 100M User IDs in MySQL
This article examines three SQL pagination strategies for a 100‑million‑row favorites table, compares their correctness and performance using EXPLAIN analysis, and demonstrates why a GROUP BY approach with proper indexing yields the most reliable and fast results.
Programming skill is reflected in rigorous thinking; even seemingly simple problems can hide many subtle details.
Given a favorites table that stores user and book IDs with a data volume of 100 million rows, the task is to retrieve distinct user IDs in a paginated fashion.
Table definition:
CREATE TABLE favorites (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'primary key',
uid BIGINT UNSIGNED NOT NULL DEFAULT 0 COMMENT 'uid',
status TINYINT(3) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'status',
book_id BIGINT UNSIGNED NOT NULL DEFAULT 0 COMMENT 'book Id',
create_time INT(11) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'create time',
PRIMARY KEY (id),
UNIQUE KEY uid_book_id (uid, book_id),
KEY uid_status (uid, status)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=gbk COMMENT='User favorite info';Three pagination designs
Design 1 – Simple LIMIT
SELECT DISTINCT uid FROM favorites ORDER BY uid DESC LIMIT 0,10;
SELECT DISTINCT uid FROM favorites ORDER BY uid DESC LIMIT 11,10; -- next pageThis approach can lose data when rows are deleted between pages, causing gaps in the result set.
Design 2 – Using a last‑seen UID
-- First page
SELECT DISTINCT uid FROM favorites ORDER BY uid DESC LIMIT 10;
-- Subsequent pages
SELECT DISTINCT uid FROM favorites WHERE uid < $last_min_uid ORDER BY uid DESC LIMIT 10;EXPLAIN shows it does not use the unique index; it scans a range of about 7 million rows, triggers a temporary table and filesort, leading to serious performance issues.
Design 3 – GROUP BY with HAVING
-- First page
SELECT uid FROM favorites GROUP BY uid ORDER BY uid DESC LIMIT 10;
-- Subsequent pages
SELECT uid FROM favorites GROUP BY uid HAVING uid < $last_min_uid ORDER BY uid DESC LIMIT 10;This method leverages the composite index (uid, book_id), limiting the scan to roughly 12 hundred rows, avoiding temporary tables and filesorts, and therefore offers the best performance.
Analysis
Design 1 may miss user IDs if deletions occur during pagination. Design 2 suffers from inefficient index usage and high row scans. Design 3 provides accurate results with minimal row access, making it the preferred solution for large‑scale pagination.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
