Root Cause Analysis and Solutions for Pagination Slow Queries in a Backend System
This article details a real‑world incident of severe pagination slow queries, walks through the timeline of detection, diagnosis, and mitigation steps, and presents multiple MySQL optimization techniques—including ID‑based queries, sub‑queries, scroll queries, and join‑based solutions—to resolve deep pagination performance issues.
Accident Background
This incident is a typical example of pagination slow queries encountered by our team; the article reconstructs the incident step by step and explains each adjustment made.
Incident Scene
16:00 Received feedback that the fusion system's pagination query availability dropped.
16:05 Checked the UMP monitoring of the query interface and found an abnormal spike in TP99 latency.
Machine monitoring showed that almost all machines had unusually high TP999 values, while CPU usage was not high.
16:10 Database monitoring revealed a CPU spike, identified as a database issue, and a flood of slow‑SQL emails arrived.
We concluded the problem would not be solved in minutes and took two steps: first, enable rate limiting to stop more slow‑SQL requests; second, analyze and refactor the slow SQL.
16:15 Logs showed most SQL pointed to merchant xxxx, which had 100,000 rows (10% of total). MQ retries surged, and the pagination interface timeout was set to 2 seconds, indicating slow queries caused high‑frequency retries that overwhelmed the database.
16:25 Code inspection identified a deep‑pagination issue and proposed an optimization: first query IDs, then use a sub‑query to fetch the full rows, while pausing upstream MQ consumption.
17:40 After deploying the optimized code, MQ consumption resumed, but accumulated messages still pressured the database, causing TP99 to spike again and CPU to hit 100%.
18:00 Decided to stop optimizing the old interface and develop a new one based on scrolling IDs, involving upstream collaboration.
22:20 The new interface went live, MQ consumption was reopened, and the system stabilized, marking the problem as resolved.
Problem Causes and Solutions
Why Deep Pagination Occurs
Problem SQL example:
select * from table where org_code = xxxx limit 1000,100MySQL's LIMIT works by reading the first 1,000 rows, discarding them, then returning the next 100; larger offsets lead to poorer performance.
Several Solutions for Deep Pagination
Query ID + ID‑Based Query
First retrieve IDs, then query by those IDs:
select id from table where org_code = xxxx limit 1000,5Then use the IDs in an IN clause:
select * from table where id in (1,2,3,4,5);Range Query Optimization
After obtaining IDs, use a range query instead of IN for higher efficiency:
select * from table where org_code = xxxx and id >= 1 and id <= 5;Or use a sub‑query to reduce I/O:
select a.id,a.dj_sku_id,a.jd_sku_id from table a join (select id from jd_spu_sku where org_code = xxxx limit 1000,5) b on a.id = b.id;Scroll Query
Each request returns the maximum ID (cursor); the next request supplies this cursor, and the server fetches the next N rows where ID > cursor.
select * from table where org_code = xxxx and id > 0 limit 10;This method is simple to implement and performs well, though it requires client changes and assumes auto‑increment ordered IDs.
Final Solution
Join‑Based Query
Because the primary‑key index was chosen for sorting and avoided a table‑row lookup, it caused a full table scan. The final approach first queries IDs (without leaf‑node fields) and then joins to fetch the full rows:
select a.id AS id, a.dj_org_code AS djOrgCode, a.dj_sku_id AS djSkuId, a.jd_sku_id AS jdSkuId, a.yn AS yn
from table a join (
SELECT id FROM table where org_code = xxxx and id > 0 order by id asc limit 500
) t on a.id = t.id;EXPLAIN shows the intended index is used, and the deployment resolved the issue.
Problem Summary
Backend systems need protection mechanisms such as rate limiting to prevent abnormal traffic from crashing the service.
Potential risk points should never be ignored; previously acceptable pagination performance can become a bottleneck under higher load.
SQL optimizations must be thoroughly tested across edge cases before deployment.
Source: juejin.cn/post/7126356005192990750
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.