Why MySQL NULL Values Can Cause P0 Outages and How to Avoid Them
This article explains how improper use of NULL in MySQL columns can lead to serious production incidents, outlines official recommendations against NULL, demonstrates its impact on queries, indexes, and storage, and provides practical guidelines for when to use or avoid NULL values.
MySQL recommends avoiding NULL
In most cases it is best to declare columns NOT NULL unless you truly need to store NULL values.
The official documentation and the book High Performance MySQL give three main reasons:
Query optimization becomes harder because NULL values complicate indexes, statistics, and comparisons.
Extra storage overhead: each index record needs an additional 1‑byte flag for nullable columns.
Index structure changes: in MyISAM, nullable columns can turn fixed‑length indexes into variable‑length ones.
Although converting NULL to NOT NULL usually yields only modest performance gains, you should avoid nullable columns when you plan to index them or when NULL would affect business logic.
NULL's impact on specific queries
Below are examples that illustrate how NULL influences COUNT, IN, and NOT IN queries.
CREATE TABLE user (
id INT PRIMARY KEY,
name VARCHAR(50),
age INT
);
INSERT INTO user VALUES
(1, '张三', 25),
(2, '李四', NULL),
(3, '王五', 30);COUNT function
-- Result: 3
SELECT COUNT(*) FROM user;
-- Result: 2 (NULL row is ignored)
SELECT COUNT(age) FROM user;When counting a specific column, NULL rows are omitted, which can inflate averages or other aggregates.
NOT IN query
-- t1: (1,'a'), (2,'b')
-- t2: (1,'a'), (3,NULL)
SELECT * FROM t1 WHERE c2 NOT IN (SELECT c2 FROM t2);
-- Expected: row with 'b'
-- Actual: empty set because NULL makes the predicate UNKNOWNSQL treats NULL as an unknown value; any comparison with NULL yields UNKNOWN, causing the whole NOT IN condition to filter out all rows.
Correct approaches:
-- Method 1: filter NULLs in the subquery
SELECT * FROM t1 WHERE c2 NOT IN (
SELECT c2 FROM t2 WHERE c2 IS NOT NULL
);
-- Method 2: use NOT EXISTS
SELECT * FROM t1 WHERE NOT EXISTS (
SELECT 1 FROM t2 WHERE t2.c2 = t1.c2
);IN query
SELECT * FROM t1 WHERE c2 IN ('a', NULL);NULL values are automatically ignored in IN lists, so the query behaves as if the NULL were absent. To retrieve rows where the column is NULL you must add an explicit OR c2 IS NULL condition.
ORDER BY and GROUP BY
According to MySQL documentation, NULL values sort before other values in ascending order and after them in descending order. In GROUP BY, all NULLs are considered equal, producing a single group.
For sorting with ORDER BY, NULL values sort before other values for ascending sorts, after other values for descending sorts.
How InnoDB stores NULL
InnoDB uses a NULL‑flag bit for each nullable column. In the compact row format the layout is:
[variable‑length field list][NULL flag][record header][actual column data]Each nullable column gets 1 bit in the flag byte.
Flag = 1 indicates NULL, 0 indicates a non‑NULL value.
The NULL value itself occupies no data space.
Storage overhead depends on the number of nullable columns:
1‑8 nullable columns → 1 byte of flag bits.
9‑16 nullable columns → 2 bytes, and so on (1 extra byte per 8 columns).
NULL in indexes
Each indexed nullable column adds 1 extra byte per index entry.
NULL values are stored in the index (different from the data row).
Index statistics must handle NULL distribution specially.
CREATE INDEX idx_age ON user(age);Performance impact
Negative‑query nightmare
-- Scenario 1: NOT NULL column, index used, good performance
SELECT * FROM orders WHERE status != 'completed';
-- Scenario 2: Nullable column with NULLs, index may be skipped, slower
SELECT * FROM orders WHERE status != 'completed';
-- Issues:
-- 1. Index must skip NULL flag bits.
-- 2. Optimizer may choose a full table scan.
-- 3. NULL rows are excluded from results, potentially causing business errors.Benchmarks on MySQL 8.0 show a NOT NULL column query on 10 million rows takes ~0.3 s, while the same query on a nullable column with 10 % NULLs takes ~1.2 s.
Update cost (NULL → NOT NULL)
ALTER TABLE big_table MODIFY COLUMN col INT NOT NULL DEFAULT 0;Altering a large table from NULL to NOT NULL can cause:
Table lock and rebuild, lasting minutes to hours.
Index page splits as existing NULL entries are rewritten.
Replication lag on slaves.
Use online DDL tools (e.g., pt-online-schema-change) or perform the change during low‑traffic windows.
Index statistics distortion
-- Table with 1 M rows, col1 90% NULL, col2 no NULLs
SELECT * FROM table WHERE col1 = 'value'; -- May trigger full scanWhen a column is mostly NULL, the optimizer can underestimate its selectivity, causing it to ignore a usable index.
Practical recommendations
Use NULL only when it truly represents an unknown or inapplicable value. For most business scenarios:
Inventory counts: store 0 instead of NULL.
Order status: use explicit states like 'draft' rather than NULL.
Columns frequently used in query predicates: declare NOT NULL.
Alibaba’s development manual enforces a strict “no NULL” rule for critical fields; following such guidelines can prevent accidental data loss.
Conclusion
Misusing NULL can turn a small mistake into a costly production incident. Understanding the storage mechanics, query behavior, and performance implications of NULL allows you to make informed design decisions, choose appropriate defaults, and avoid hidden pitfalls.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
