When Do IS NULL and IS NOT NULL Use Indexes in MySQL? Surprising Findings
This article explores how MySQL decides whether queries with IS NULL or IS NOT NULL predicates can use an index, presenting three concrete examples with varying data sizes and NULL distributions, and explains the underlying storage of NULL values and optimizer cost decisions.
After publishing a series on MySQL indexes, this article analyzes when queries using IS NULL or IS NOT NULL can benefit from an index.
Example 1: Small dataset, index used
Using MySQL 8.0.18, a table test is created with an index on col1 and two rows where col1 is NULL:
CREATE TABLE test (
id INT PRIMARY KEY,
col1 INT,
col2 INT,
INDEX idx_col1 (col1)
);
INSERT INTO test VALUES (1, NULL, 1);
INSERT INTO test VALUES (2, NULL, 2);Running EXPLAIN SELECT * FROM test WHERE col1 IS NULL \G shows Using index condition , confirming the index is used. The same holds for IS NOT NULL:
EXPLAIN SELECT * FROM test WHERE col1 IS NOT NULL \GExample 2: Large dataset, few NULL values
The same schema is populated with 30,000 rows, about 5% of col1 being NULL: SELECT COUNT(*) FROM test; For IS NULL, EXPLAIN reports Using index condition (rows ≈ 1,551). For IS NOT NULL, the optimizer chooses a full table scan ( type: ALL, rows ≈ 30,570) because the majority of rows satisfy the predicate, making index lookup more expensive.
Example 3: Large dataset, many NULL values
With 30,000 rows where roughly 95% of col1 are NULL, both predicates use the index:
EXPLAIN SELECT * FROM test WHERE col1 IS NULL \GResult shows Using index condition (rows ≈ 14,957). Likewise, IS NOT NULL also uses the index (rows ≈ 1,558).
How indexes store NULL values
B+ tree sorting : NULL is treated as the smallest value, so it appears at the leftmost part of the tree.
Linked‑list storage : Leaf nodes contain pointers to rows; records with NULL are clustered on the left and can be read sequentially.
Finding NULL : The engine scans from the leftmost leaf to locate all NULL entries.
Reasons an index may be ignored
The optimizer decides based on execution cost, not merely on the presence of NULL predicates.
If the result set is large (e.g., most rows are NULL), the cost of fetching many rows via the index can exceed that of a full scan, so the optimizer prefers a table scan.
If the index can dramatically reduce data access—such as a small result set or a covering index—the optimizer will choose the index.
Conclusion
Both IS NULL and IS NOT NULL can use an index, but whether they do depends on data distribution and the optimizer’s cost estimation, not on the predicates themselves.
Senior Brother's Insights
A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
