Why MySQL Unique Indexes Fail with NULL and How to Fix Them
This article explains why a unique index in MySQL can still allow duplicate rows when indexed columns contain NULL, explores the challenges of adding unique indexes to logically deleted tables, and presents practical solutions such as incremental delete status, timestamps, extra IDs, hash fields, and proper batch insertion techniques.
Introduction
Recently I encountered a problem: even after adding a UNIQUE index to an InnoDB table in MySQL 8, duplicate data still appeared. This article shares the experience and discusses interesting aspects of unique indexes.
1. Reproducing the Issue
To prevent duplicate product groups, I created a product_group_unique table with the following structure:
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;I added a unique index on (category_id, unit_id, model_hash) to guarantee uniqueness:
ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);Although the index prevented duplicates when model_hash had a value, inserting rows where model_hash was NULL succeeded, resulting in duplicate records.
2. Unique Index Columns Containing NULL
MySQL treats NULL values as distinct for unique constraints, so rows with NULL in any indexed column are not considered duplicates. Therefore, the uniqueness guarantee fails when model_hash can be NULL.
When a unique index column allows NULL , MySQL’s uniqueness constraint may become ineffective.
3. Unique Index on Logically Deleted Tables
Logical deletion (using an UPDATE to set a delete_status flag) keeps rows in the table, which makes adding a unique index problematic because the deleted rows still occupy the unique key space.
Typical solutions include:
3.1 Incremental Delete Status
Instead of a binary flag, use an incrementing delete_status value (1, 2, 3, …). Each deletion increments the status, ensuring that the combination of business fields and delete_status remains unique.
3.2 Timestamp Field
Add a timestamp column and include it in the unique index. Each logical delete writes the current timestamp, guaranteeing uniqueness even for repeated deletions.
3.3 Additional ID Field
Introduce a separate delete_id column (e.g., the primary key of the row) and include it in the unique index alongside the business fields.
4. Adding a Unique Index When Historical Duplicates Exist
If a table already contains duplicate historical data, create a new “anti‑duplicate” table, migrate distinct rows, and then add the unique index to the original table after cleaning up duplicates. Alternatively, add a delete_id column, assign the maximum id to the first occurrence, and set subsequent duplicates to their own id, then create the unique index on the combined columns.
5. Unique Index on Large Columns
When a column (e.g., model) is too large for MySQL’s 1000‑byte unique‑key limit, consider:
5.1 Adding a Hash Column
Store a short hash of the large field and create the unique index on the hash together with other identifying columns. Be aware of possible hash collisions.
5.2 Not Adding a Unique Index
Rely on application‑level controls such as single‑threaded insertion or message‑queue processing to prevent duplicates.
5.3 Redis Distributed Lock
Generate a hash from the combination of fields and use it as a Redis lock key during insertion to avoid concurrent duplicates.
6. Batch Insertion
Instead of locking each row individually, use MySQL’s bulk INSERT with a unique index. The database will reject duplicate rows in a single statement, providing both simplicity and performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
