Why MySQL Unique Indexes Still Let Duplicates Slip Through and How to Prevent Them
This article examines a MySQL InnoDB pitfall where a unique index fails to block duplicate rows—especially when indexed columns contain NULL values or when logical deletion is used—and presents practical solutions such as adjusting index columns, adding timestamps, delete status counters, hash fields, and proper bulk‑insert strategies.
Introduction
Recently I encountered a situation where a UNIQUE index on a MySQL 8 InnoDB table still allowed duplicate rows. The article walks through the root causes and shares several techniques to ensure true uniqueness.
1. Reproducing the Issue
A table product_group_unique was created to prevent duplicate product groups:
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;A unique index was added on (category_id, unit_id, model_hash):
ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);When model_hash is non‑NULL, duplicates are rejected, but inserting rows where model_hash is NULL results in duplicate entries.
2. Unique Index Columns Containing NULL
MySQL treats NULL as a distinct value for each row, so the uniqueness constraint does not apply when any indexed column is NULL. Consequently, rows with identical non‑NULL columns but a NULL in the indexed field can be inserted multiple times.
When a column participating in a unique index can be NULL , the index will not enforce uniqueness for those rows.
3. Adding a Unique Index to a Logically Deleted Table
Logical deletion usually adds a delete_status flag and updates it instead of physically removing rows. This pattern makes it difficult to keep a unique index because the same logical key may appear with different delete_status values.
3.1 Incremental Delete Status
Instead of a binary flag, use an incrementing delete_status value (1, 2, 3, …). Each deletion increments the status, ensuring that the combination of business keys and delete_status remains unique.
Add record A → delete_status = 0 Delete A → delete_status = 1 Add A again → delete_status = 0 Delete A again → delete_status = 2 Repeat as needed.
Pros: Simple, no schema changes. Cons: Queries must treat delete_status >= 1 as deleted.
3.2 Add a Timestamp Field
Introduce a timestamp column that records the moment of logical deletion. The unique index then covers (name, model, delete_status, timestamp). Because each deletion gets a distinct timestamp (seconds or milliseconds), duplicates are avoided.
3.3 Add a Delete‑ID Field
Add a surrogate delete_id column. When a row is logically deleted, copy its primary key into delete_id. The unique index includes (name, model, delete_status, delete_id). This approach preserves existing delete logic while guaranteeing uniqueness.
4. Handling Historical Duplicate Data
If a table already contains duplicate rows, you can either migrate the data to a separate anti‑duplication table or process the existing rows to assign a unique delete_id before creating the index. Example SQL to populate a new table with distinct rows:
INSERT INTO product_unique (id, name, category_id, unit_id, model)
SELECT MAX(id), name, category_id, unit_id, model
FROM product
GROUP BY name, category_id, unit_id, model;After assigning unique identifiers to historical rows, a composite unique index can be added directly on the original table.
5. Adding a Unique Index to Large Columns
MySQL limits index length to 3072 bytes (1000 bytes for a unique key). When a column such as model exceeds this limit, consider the following strategies:
5.1 Add a Hash Column
Store a fixed‑length hash (e.g., MD5, SHA‑1) of the large column in a separate field and create the unique index on the hash together with other key columns. Be aware of possible hash collisions.
5.2 Omit the Unique Index
Rely on application‑level controls (single‑threaded inserts, MQ‑driven processing) to prevent duplicates when indexing is impractical.
5.3 Use Redis Distributed Locks
Generate a hash of the composite key and acquire a Redis lock on that hash before inserting. This reduces contention but does not replace a proper unique index.
6. Bulk Insertion
For batch inserts, using MySQL’s native unique index is far more efficient than per‑row Redis locks. A single INSERT … ON DUPLICATE KEY IGNORE (or similar) statement lets the database reject duplicates automatically.
INSERT INTO product_group_unique (category_id, unit_id, model_hash, in_date)
VALUES (...), (...), (...);
-- MySQL will abort the statement if any row violates the UNIQUE index.This approach avoids the performance penalties and lock‑timeout risks associated with manual distributed locking.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
