Why MySQL Unique Indexes Fail with NULL and How to Fix Them
This article explains why a MySQL UNIQUE index does not prevent duplicate rows when indexed columns contain NULL, explores the complications of logical‑delete tables, and presents practical solutions such as incrementing delete status, adding timestamps, delete‑id keys, hash fields, and proper batch inserts.
Problem Overview
When a UNIQUE index is created on columns that may contain NULL values, MySQL treats each NULL as distinct, so duplicate rows can still be inserted. The article demonstrates this issue using a product_group_unique table that stores product group information.
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;A unique index on (category_id, unit_id, model_hash) correctly blocks duplicate rows when model_hash has a value, but allows duplicates when model_hash is NULL:
alter table product_group_unique add unique index ux_category_unit_model(category_id,unit_id,model_hash);Inserting two rows with model_hash = NULL succeeds, proving that NULL disables the uniqueness check.
Why NULL Breaks Uniqueness
MySQL follows the SQL standard: any comparison with NULL yields UNKNOWN, so the uniqueness constraint treats each NULL as a separate value. Consequently, rows that differ only by having NULL in the indexed column are considered distinct.
Logical‑Delete Tables and Unique Indexes
Logical deletion is usually implemented by an UPDATE that sets a delete_status flag instead of physically removing the row. When a table uses logical deletion, a unique index on the business key (e.g., name, model) will still see the deleted rows because the delete_status column is the same for all of them, causing future inserts of the same business key to fail.
Solution 1 – Incremental Delete Status
Instead of a binary flag, use a numeric delete_status that increments each time the row is deleted. The sequence 0 → 1 → 2 → … guarantees that each logical deletion produces a distinct value, allowing the unique index to differentiate the rows.
-- Insert a new record
INSERT INTO product (id, name, delete_status) VALUES (1, 'A', 0);
-- Logical delete
UPDATE product SET delete_status = 1 WHERE id = 1;
-- Insert the same business key again
INSERT INTO product (id, name, delete_status) VALUES (2, 'A', 0);
-- Delete again – status becomes 2
UPDATE product SET delete_status = 2 WHERE id = 2;Advantages: no schema changes, simple implementation. Drawback: every query must filter delete_status >= 1 instead of = 1.
Solution 2 – Timestamp Field
Add a deleted_at timestamp column. Each logical delete writes the current timestamp, guaranteeing a unique value for every deletion (down to seconds, or milliseconds for high‑concurrency scenarios). The unique index then includes this timestamp.
ALTER TABLE product ADD COLUMN deleted_at BIGINT DEFAULT 0;
-- Logical delete
UPDATE product SET deleted_at = UNIX_TIMESTAMP() WHERE id = 1;Advantages: minimal impact on existing code. Drawback: in extremely concurrent environments two deletions could share the same second‑level timestamp; using millisecond precision mitigates this.
Solution 3 – Dedicated Delete ID (Primary Key)
Introduce an auto‑increment delete_id column. When a row is first inserted, set delete_id = 1. On each logical delete, copy the row’s primary key into delete_id. The unique index then covers (name, model, delete_status, delete_id), ensuring uniqueness without altering business logic.
ALTER TABLE product ADD COLUMN delete_id BIGINT DEFAULT 1;
-- Logical delete
UPDATE product SET delete_id = id WHERE id = 1;This approach is often the most robust because it leverages the existing primary‑key uniqueness.
Handling Historical Duplicate Data
If a table already contains duplicate rows, create a temporary “anti‑duplicate” table, migrate distinct rows using GROUP BY, then add the new unique index. Example:
INSERT INTO product_unique (id, name, category_id, unit_id, model)
SELECT MAX(id), name, category_id, unit_id, model
FROM product
GROUP BY name, category_id, unit_id, model;After cleaning, add the desired unique index.
Large Text Columns and Unique Index Length Limits
InnoDB limits index length to 3072 bytes (1000 bytes for a UNIQUE key). Very large columns (e.g., TEXT, VARCHAR(5000)) exceed this limit. Two common work‑arounds:
Hash column: Store a fixed‑size hash (e.g., MD5, SHA‑1) of the large column and create a unique index on the hash together with other key columns.
Omit the unique index: Rely on application‑level checks, single‑threaded inserts, or distributed locks to prevent duplicates.
Batch Inserts vs. Distributed Locks
For high‑throughput batch inserts, relying on a MySQL UNIQUE index is far simpler and more reliable than acquiring a Redis lock per row. A single INSERT ... VALUES (...), (...), ... statement will automatically abort on duplicate keys, preserving atomicity and performance.
INSERT INTO product (id, name, category_id, unit_id, model) VALUES
(1,'A',10,20,'x'),
(2,'B',11,21,'y');
-- If a duplicate exists, the statement fails and no rows are inserted.If a lock is still required (e.g., for complex business validation), generate a composite hash of the relevant fields and lock that hash instead of locking each row individually.
Key Takeaways
NULL values in a UNIQUE index bypass the uniqueness check.
Logical‑delete tables need an additional varying column (status counter, timestamp, delete_id, or hash) to make a UNIQUE index viable.
When dealing with large text columns, use a short hash column or avoid the UNIQUE index altogether.
Batch inserts should rely on the database’s native UNIQUE constraint rather than external distributed locks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
