Why MySQL Scans the Whole Table Before a Subquery? A Deep Dive into Slow Queries
This article investigates a MySQL slow query caused by a subquery that triggers a full‑table scan, explains MySQL's query rewrite behavior, and shows how Django's lazy ORM evaluation can worsen performance, offering practical fixes using explicit query execution or join rewrites.
Identify the Cause
While checking MySQL's slow‑query log, I found a query taking over a second and scanning more than 100,000 rows, essentially a full‑table scan.
<code>SELECT * FROM tgdemand_demand t1
WHERE (
t1.id IN (
SELECT t2.demand_id
FROM tgdemand_job t2
WHERE (t2.state = 'working' AND t2.wangwang = 'abc')
)
AND NOT (t1.state = 'needConfirm')
)
ORDER BY t1.create_date DESC</code>The query first runs a subquery to fetch demand IDs where the job state is
workingand the associate
wangwangis
abc, then filters the main table
tgdemand_demandfor those IDs and excludes rows with state
needConfirm, finally ordering by creation date.
Even though the subquery should produce a small set of primary‑key values, the execution plan shows MySQL scanning the entire
tgdemand_demandtable (type
ALL, rows ≈157 089) while the subquery table
tgdemand_jobuses an index. This contradicts the expected order of operations.
MySQL rewrites the query into an
EXISTSform:
<code>SELECT * FROM tgdemand_demand t1 WHERE EXISTS (
SELECT * FROM tgdemand_job t2 WHERE t1.id = t2.demand_id AND (t2.state = 'working' AND t2.wangwang = 'abc')
) AND NOT (t1.state = 'needConfirm')
ORDER BY t1.create_date DESC;</code>Consequently, MySQL scans every row of
tgdemand_demandand executes the subquery for each row (≈157 089 times), which explains the poor performance.
Problem Fix
The simplest fix is to avoid the subquery altogether: first retrieve the demand IDs, then run a separate query using those IDs.
<code>ids = SELECT t2.demand_id
FROM tgdemand_job t2
WHERE (t2.state = 'working' AND t2.wangwang = 'abc');
SELECT * FROM tgdemand_demand t1
WHERE (
t1.id IN ids
AND NOT (t1.state = 'needConfirm')
)
ORDER BY t1.create_date DESC;</code>In Python/Django the same logic can be expressed as:
<code>demand_ids = Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True)
demands = Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')</code>However, the generated SQL was still the slow one because Django's QuerySet is lazily evaluated. The first QuerySet is passed as a parameter to
id__inwithout being executed, so the subquery is performed for every row of the outer query.
To force early execution, we can materialize the QuerySet:
<code>demand_ids = list(Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True))
demands = Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')</code>After this change, the page response time returned to normal.
Alternatively, we can rewrite the SQL to use a join instead of a subquery:
<code>SELECT * FROM tgdemand_demand t1, (
SELECT t.demand_id FROM tgdemand_job t WHERE t.state = 'working' AND t.wangwang = 'abc'
) t2
WHERE t1.id = t2.demand_id AND NOT (t1.state = 'needConfirm')
ORDER BY t1.create_date DESC;</code>Takeaways
Frameworks boost productivity only when we understand their underlying behavior; otherwise hidden performance pitfalls can surface at scale. Knowing how MySQL rewrites subqueries and how Django lazily evaluates QuerySets is essential for building robust applications.
References
http://www.cnblogs.com/zhengyun_ustc/p/slowquery3.html http://dev.mysql.com/doc/refman/5.5/en/explain-output.html https://docs.djangoproject.com/en/1.9/ref/models/querysets/
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.