Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them
This article explains why a MySQL InnoDB table with a unique index can still contain duplicate rows—especially when indexed columns contain NULL values or when logical deletion is used—and presents several practical strategies, including schema changes and alternative locking mechanisms, to enforce true uniqueness.
Background and the Problem
When adding a unique index to an InnoDB table in MySQL 8, duplicate rows can still appear. The author discovered this issue while building a deduplication table for product groups, where the model_hash column could be NULL. After creating a unique index on (category_id, unit_id, model_hash), duplicate rows were observed the next day.
Reproducing the Issue
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model(category_id, unit_id, model_hash);Inserting rows where model_hash is NULL succeeded, producing duplicate records because MySQL treats NULL values as distinct for uniqueness checks.
Why NULL Breaks Uniqueness
If any column in a unique index contains NULL, MySQL does not enforce uniqueness for that row. Consequently, rows with identical non‑NULL values but a NULL in the indexed column can coexist.
When model_hash is not NULL, duplicates are prevented.
When model_hash is NULL, duplicates appear.
Therefore, columns that may be NULL should be excluded from unique indexes or given a default non‑NULL placeholder.
Unique Indexes on Logically Deleted Tables
Logical deletion (using a delete_status flag) complicates unique indexes because the same logical record remains in the table with a different status. Adding a unique index on (name, model, delete_status) would block re‑insertion of a logically deleted record.
Three practical workarounds are presented:
Incremental Delete Status : Instead of a binary flag, use an integer that increments on each deletion (1, 2, 3, …). This makes each deletion a distinct value, preserving uniqueness.
Timestamp Field : Add a timestamp column (seconds or milliseconds) to the unique index. Each deletion writes a new timestamp, ensuring uniqueness without altering existing business logic.
Surrogate Delete ID : Introduce a delete_id primary‑key‑like column. On insertion set it to 1; on logical deletion set it to the row’s primary key. Include delete_id in the unique index alongside other columns.
Adding a Unique Index to Tables with Historical Duplicates
When a table already contains duplicate rows, the author suggests creating a separate deduplication table and migrating distinct records, but the main goal is to add the unique index directly. The recommended steps are:
INSERT INTO product_unique (id, name, category_id, unit_id, model)
SELECT MAX(id), name, category_id, unit_id, model
FROM product
GROUP BY name, category_id, unit_id, model;After cleaning the data, a unique index can be added on the desired columns.
Unique Indexes on Large Columns
MySQL InnoDB limits index length to 3072 bytes, with a unique‑key limit of 1000 bytes. Large VARCHAR or TEXT columns exceed this, making a direct unique index impossible.
Solutions include:
Hash Column : Store a fixed‑length hash (e.g., MD5, SHA‑1) of the large column and index the hash together with other columns.
Skip the Index : Rely on application‑level checks or other mechanisms when indexing is impractical.
Redis Distributed Lock : Use a Redis lock keyed by a hash of the combined fields to serialize inserts, though this adds complexity and potential lock‑timeout issues.
Batch Inserts and Concurrency
For bulk insert scenarios, relying on MySQL’s unique index is simpler and more efficient than per‑record Redis locks. A single INSERT … VALUES … statement will automatically abort on duplicate keys, preserving data integrity without the overhead of distributed locking.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
