Databases 8 min read

Why MySQL Scans the Whole Table Before a Subquery? A Deep Dive into Slow Queries

This article investigates a MySQL slow query caused by a subquery that triggers a full‑table scan, explains MySQL's query rewrite behavior, and shows how Django's lazy ORM evaluation can worsen performance, offering practical fixes using explicit query execution or join rewrites.

Efficient Ops
Efficient Ops
Efficient Ops
Why MySQL Scans the Whole Table Before a Subquery? A Deep Dive into Slow Queries

Identify the Cause

While checking MySQL's slow‑query log, I found a query taking over a second and scanning more than 100,000 rows, essentially a full‑table scan.

<code>SELECT * FROM tgdemand_demand t1
WHERE (
  t1.id IN (
    SELECT t2.demand_id
    FROM tgdemand_job t2
    WHERE (t2.state = 'working' AND t2.wangwang = 'abc')
  )
  AND NOT (t1.state = 'needConfirm')
)
ORDER BY t1.create_date DESC</code>

The query first runs a subquery to fetch demand IDs where the job state is

working

and the associate

wangwang

is

abc

, then filters the main table

tgdemand_demand

for those IDs and excludes rows with state

needConfirm

, finally ordering by creation date.

Even though the subquery should produce a small set of primary‑key values, the execution plan shows MySQL scanning the entire

tgdemand_demand

table (type

ALL

, rows ≈157 089) while the subquery table

tgdemand_job

uses an index. This contradicts the expected order of operations.

MySQL rewrites the query into an

EXISTS

form:

<code>SELECT * FROM tgdemand_demand t1 WHERE EXISTS (
  SELECT * FROM tgdemand_job t2 WHERE t1.id = t2.demand_id AND (t2.state = 'working' AND t2.wangwang = 'abc')
) AND NOT (t1.state = 'needConfirm')
ORDER BY t1.create_date DESC;</code>

Consequently, MySQL scans every row of

tgdemand_demand

and executes the subquery for each row (≈157 089 times), which explains the poor performance.

Problem Fix

The simplest fix is to avoid the subquery altogether: first retrieve the demand IDs, then run a separate query using those IDs.

<code>ids = SELECT t2.demand_id
FROM tgdemand_job t2
WHERE (t2.state = 'working' AND t2.wangwang = 'abc');

SELECT * FROM tgdemand_demand t1
WHERE (
  t1.id IN ids
  AND NOT (t1.state = 'needConfirm')
)
ORDER BY t1.create_date DESC;</code>

In Python/Django the same logic can be expressed as:

<code>demand_ids = Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True)

demands = Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')</code>

However, the generated SQL was still the slow one because Django's QuerySet is lazily evaluated. The first QuerySet is passed as a parameter to

id__in

without being executed, so the subquery is performed for every row of the outer query.

To force early execution, we can materialize the QuerySet:

<code>demand_ids = list(Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True))

demands = Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')</code>

After this change, the page response time returned to normal.

Alternatively, we can rewrite the SQL to use a join instead of a subquery:

<code>SELECT * FROM tgdemand_demand t1, (
  SELECT t.demand_id FROM tgdemand_job t WHERE t.state = 'working' AND t.wangwang = 'abc'
) t2
WHERE t1.id = t2.demand_id AND NOT (t1.state = 'needConfirm')
ORDER BY t1.create_date DESC;</code>

Takeaways

Frameworks boost productivity only when we understand their underlying behavior; otherwise hidden performance pitfalls can surface at scale. Knowing how MySQL rewrites subqueries and how Django lazily evaluates QuerySets is essential for building robust applications.

References

http://www.cnblogs.com/zhengyun_ustc/p/slowquery3.html http://dev.mysql.com/doc/refman/5.5/en/explain-output.html https://docs.djangoproject.com/en/1.9/ref/models/querysets/

Explain output
Explain output
Query OptimizationMySQLDatabase performanceslow querysubqueryDjango ORM
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.