Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them
This article explores a common MySQL InnoDB pitfall where unique indexes fail to prevent duplicate rows, explains how NULL values and logical deletion affect uniqueness, and presents practical strategies—including composite keys, delete status counters, timestamps, hash fields, and Redis locks—to enforce true uniqueness.
Introduction
Recently I encountered a pitfall: a MySQL 8 InnoDB table with a unique index still allowed duplicate data. This article walks through that experience, explains why the unique index failed, and shares useful knowledge about unique indexes.
1. Reproducing the Issue
To prevent duplicate product groups, I created a "prevent‑duplicate" table and added a unique index on category_id, unit_id and model_hash. The table definition is:
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;Adding the unique index:
ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);When model_hash is non‑NULL, duplicates are blocked. However, after inserting rows where model_hash is NULL, MySQL allowed duplicate records.
2. Unique Index Fields Containing NULL
If a column participating in a unique index contains NULL, MySQL treats each NULL as distinct, so the uniqueness constraint does not apply. This explains why inserting multiple rows with model_hash = NULL succeeded.
When a unique index column can be NULL , the uniqueness constraint may become ineffective.
3. Unique Index on Logically Deleted Tables
Logical deletion (using an update to set a delete_status flag) keeps the row in the table, which makes adding a unique index problematic because the deleted rows still occupy the unique key space.
Typical delete operation: DELETE FROM product WHERE id = 123; Logical delete operation:
UPDATE product SET delete_status = 1, edit_time = NOW() WHERE id = 123;Because the row remains, a unique index on name and model prevents re‑inserting the same product even after it has been logically deleted.
3.1 Incremental Delete Status
Instead of a binary flag, use an ever‑increasing delete_status value (1, 2, 3, …). Each deletion increments the status, ensuring the composite unique key stays unique.
3.2 Add a Timestamp Field
Introduce a timestamp column and include it in the unique index (e.g., name, model, delete_status, timestamp). The timestamp changes on each logical delete, preserving uniqueness.
3.3 Add an ID Field
Add a separate delete_id column. When a row is logically deleted, set delete_id to the row’s primary key. The unique index on name, model, delete_status, delete_id then remains valid without altering existing delete logic.
4. Adding a Unique Index to Tables with Historical Duplicate Data
If a table already contains duplicate historical rows, first assign a distinguishing delete_id (or similar) to each row, then create the composite unique index on the chosen columns plus the new identifier.
5. Unique Index on Large Columns
MySQL InnoDB limits index length to 3072 bytes (1000 bytes for a unique key). When a column (e.g., model) exceeds this, the unique index cannot be created directly.
5.1 Add a Hash Column
Store a short hash (e.g., 16‑ or 32‑character) of the large column in a separate field and build the unique index on the hash together with other columns. This avoids the length limitation, though hash collisions must be considered.
5.2 Omit the Unique Index
Rely on application‑level mechanisms (single‑threaded jobs, MQ consumers, etc.) to guarantee uniqueness when an index is impractical.
5.3 Use Redis Distributed Locks
Generate a hash from the combination of fields and acquire a Redis lock on that hash before inserting. Even if a collision occurs, the probability is low under normal concurrency.
6. Bulk Insertion Scenarios
When inserting many rows, relying on a Redis lock per row degrades performance. Instead, let MySQL’s unique index handle duplicates: a single bulk INSERT will fail for duplicate rows, allowing the database to enforce uniqueness efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
