Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them
This article explains why a MySQL 8 InnoDB table with a unique index can still store duplicate rows—especially when indexed columns contain NULL or when logical deletion is used—and presents several practical strategies, including status counters, timestamps, extra IDs, hash fields, and batch‑insert techniques, to enforce true uniqueness.
1. Reproducing the Problem
We created a table product_group_unique with columns id, category_id, unit_id, model_hash, and in_date. After adding a unique index on (category_id, unit_id, model_hash), duplicate rows still appeared when inserting data, demonstrating that the unique constraint was not effective.
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model(category_id, unit_id, model_hash);2. Unique Index Columns Containing NULL
If model_hash is NULL, MySQL treats each NULL as a distinct value, so the unique index does not block duplicate rows. Inserts with model_hash = NULL succeed even when other indexed columns match, proving that NULL values break uniqueness enforcement.
3. Unique Index on Logically Deleted Tables
Logical deletion uses a delete_status flag (e.g., 0 = active, 1 = deleted) instead of physically removing rows. Adding a unique index on columns like name and model prevents re‑inserting a logically deleted record because the row still exists.
3.1 Incremental Delete Status
Instead of a binary flag, increase delete_status each time a row is deleted (1, 2, 3, …). The combination of name, model, and the ever‑changing delete_status remains unique.
Add record A → delete_status = 0.
Delete A → delete_status = 1.
Add A again → delete_status = 0.
Delete A again → delete_status = 2.
3.2 Add a Timestamp Field
Include a timestamp (seconds or milliseconds) in the unique index together with name, model, and delete_status. Each deletion writes a new timestamp, guaranteeing uniqueness even if the same logical record is deleted multiple times.
3.3 Add an Extra ID Field
Introduce a surrogate column delete_id. When a row is inserted, set delete_id = 1. Upon logical deletion, copy the row’s primary key into delete_id. The unique index on (name, model, delete_status, delete_id) then distinguishes each deletion without altering existing business logic.
4. Handling Historical Duplicate Data
For tables that already contain duplicate rows, assign a new delete_id to each group of duplicates (e.g., the first occurrence gets delete_id = 1, later duplicates receive the original row’s primary key). After this data cleanup, a unique index on (name, model, delete_status, delete_id) can be safely added.
5. Unique Index on Large Columns
InnoDB limits index length to 3072 bytes, with a unique key maximum of 1000 bytes. When a column (e.g., model) exceeds this, the index cannot be created directly.
5.1 Add a Hash Column
Store a fixed‑length hash (e.g., MD5, SHA‑1) of the large column in a new field and create the unique index on the hash together with other columns. This avoids the length limitation, though hash collisions are possible.
5.2 Skip the Unique Index
If the index is impractical, enforce uniqueness through application logic, such as single‑threaded inserts, message‑queue serialization, or other coordination mechanisms.
5.3 Use Redis Distributed Locks
Generate a hash of the composite key ( name, model, delete_status, delete_id) and acquire a Redis lock on that hash before inserting. Even if a rare hash collision occurs, the probability is low in typical workloads.
6. Bulk Insert Scenarios
Applying a Redis lock per row in a large batch is inefficient. Instead, rely on MySQL’s unique index and perform a bulk INSERT. MySQL will automatically reject duplicate rows, allowing the batch to succeed for all unique records while reporting errors for duplicates.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
