Optimizing a Complex MySQL Slow Query for Article Comments
This article analyzes a 60‑second MySQL query that retrieves article comments with multiple filters, explains why the optimizer chooses a small table as the driver, and presents a step‑by‑step optimization—including avoiding semi‑joins, improving index usage, refining range conditions, and moving GROUP BY into a subquery—that reduces execution time to 1.3 seconds, achieving a 60‑fold speedup.
Earlier the author encountered a complex slow query that needed to fetch articles published between 2019‑12‑01 and 2019‑12‑31 in the economic and technology categories, together with the count of hot comments (upvote > 100 and comment length between 10 and 30). The involved tables are tb_article (2 million rows), tb_cmt (10 million rows), tb_user (1 million rows) and tb_category (300 rows).
The original query was:
SELECT
tb_article.`title`,
tb_user.`name`,
count(1) AS `total`
FROM tb_article
LEFT JOIN tb_cmt ON tb_article.`id` = tb_cmt.`article_id`
INNER JOIN tb_user ON tb_article.`userid` = tb_user.`id`
WHERE tb_article.`type` IN (
SELECT code FROM tb_category WHERE code LIKE '12%' OR code LIKE '13%'
)
AND tb_cmt.`upvote` > 100
AND tb_cmt.`len` BETWEEN 10 AND 30
AND tb_article.`create_time` BETWEEN '2019-12-01 00:00:00' AND '2019-12-31 23:59:59'
GROUP BY tb_cmt.`article_id`;EXPLAIN showed that MySQL rewrote the IN sub‑query into a semi‑join and chose tb_category as the driver table because it is the smallest (300 rows). This caused the optimizer to scan large intermediate result sets (about 6.11 million rows) and prevented the use of indexes on the non‑driver tables.
Step 1 – Avoid semi‑join : The author first split the IN sub‑query into two separate statements, retrieving the category codes first and then plugging them into the main query. The revised query reduced the execution time from ~60 s to 18 s.
SELECT code FROM tb_category WHERE code LIKE '12%' OR code LIKE '13%';
SELECT
tb_article.`title`,
tb_user.`name`,
count(1) AS `total`
FROM tb_article
LEFT JOIN tb_cmt ON tb_article.`id` = tb_cmt.`article_id`
INNER JOIN tb_user ON tb_article.`userid` = tb_user.`id`
WHERE tb_article.`type` IN ('1213331','1374609','1389750','1204526','1382565','1239054','1321189','1292666')
AND tb_cmt.`upvote` > 100
AND tb_cmt.`len` BETWEEN 10 AND 30
AND tb_article.`create_time` BETWEEN '2019-12-01 00:00:00' AND '2019-12-31 23:59:59'
GROUP BY tb_cmt.`article_id`;Step 2 – Maximize index usage : The query still relied on the upvote_len_idx index of tb_cmt, but because both upvote and len are range conditions, MySQL could not use the composite index efficiently. The author attempted to filter comments first in a derived table, hoping to use the index, but the performance degraded to 40 s.
SELECT
tb_article.`title`,
tb_user.`name`,
count(1) AS `total`
FROM tb_article
LEFT JOIN (
SELECT article_id FROM tb_cmt
WHERE upvote > 100
AND len IN (10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
) t ON t.article_id = tb_article.id
INNER JOIN tb_user ON tb_user.id = tb_article.userid
WHERE tb_article.create_time BETWEEN '2019-12-01 00:00:00' AND '2019-12-31 23:59:59'
AND tb_article.type IN ('1213331','1374609','1389750','1204526','1382565','1239054','1321189','1292666')
GROUP BY tb_article.id;Step 3 – Range query optimization : To make the range condition usable, the author replaced the len BETWEEN 10 AND 30 with an IN list and created a new composite index len_upvote_idx (len first, upvote second). However, the query became slower (40 s) because the derived table t still caused a costly join.
Step 4 – GROUP BY optimization : The key insight was to move the GROUP BY into the derived table, allowing MySQL to aggregate before the join and to suppress the default sorting with ORDER BY NULL. The final query ran in 1.3 s, a ~60× improvement.
SELECT
tb_article.`title`,
tb_user.`name`,
t.total
FROM tb_article
LEFT JOIN (
SELECT article_id, COUNT(1) AS total
FROM tb_cmt
WHERE upvote > 100
AND len IN (10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
GROUP BY article_id
ORDER BY NULL
) t ON t.article_id = tb_article.id
INNER JOIN tb_user ON tb_user.id = tb_article.userid
WHERE tb_article.create_time BETWEEN '2019-12-01 00:00:00' AND '2019-12-31 23:59:59'
AND tb_article.type IN ('1213331','1374609','1389750','1204526','1382565','1239054','1321189','1292666');The EXPLAIN of the final statement shows an auto‑generated auto_key0 index on the temporary table, which speeds up the final join with tb_article.
Copyright: This article is authored by the “import_bigdata” WeChat public account and is licensed exclusively to the original author. Unauthorized reproduction will be pursued for infringement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
