Databases 14 min read

Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them

This article explains why a MySQL 8 InnoDB table with a unique index can still store duplicate rows—especially when indexed columns contain NULL or when logical deletion is used—and presents several practical strategies, including status counters, timestamps, extra IDs, hash fields, and batch‑insert techniques, to enforce true uniqueness.

ITPUB

Nov 9, 2024

Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them

1. Reproducing the Problem

We created a table product_group_unique with columns id, category_id, unit_id, model_hash, and in_date. After adding a unique index on (category_id, unit_id, model_hash), duplicate rows still appeared when inserting data, demonstrating that the unique constraint was not effective.

CREATE TABLE `product_group_unique` (
  `id` bigint NOT NULL,
  `category_id` bigint NOT NULL,
  `unit_id` bigint NOT NULL,
  `model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `in_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model(category_id, unit_id, model_hash);

2. Unique Index Columns Containing NULL

If model_hash is NULL, MySQL treats each NULL as a distinct value, so the unique index does not block duplicate rows. Inserts with model_hash = NULL succeed even when other indexed columns match, proving that NULL values break uniqueness enforcement.

3. Unique Index on Logically Deleted Tables

Logical deletion uses a delete_status flag (e.g., 0 = active, 1 = deleted) instead of physically removing rows. Adding a unique index on columns like name and model prevents re‑inserting a logically deleted record because the row still exists.

3.1 Incremental Delete Status

Instead of a binary flag, increase delete_status each time a row is deleted (1, 2, 3, …). The combination of name, model, and the ever‑changing delete_status remains unique.

Add record A → delete_status = 0.

Delete A → delete_status = 1.

Add A again → delete_status = 0.

Delete A again → delete_status = 2.

3.2 Add a Timestamp Field

Include a timestamp (seconds or milliseconds) in the unique index together with name, model, and delete_status. Each deletion writes a new timestamp, guaranteeing uniqueness even if the same logical record is deleted multiple times.

3.3 Add an Extra ID Field

Introduce a surrogate column delete_id. When a row is inserted, set delete_id = 1. Upon logical deletion, copy the row’s primary key into delete_id. The unique index on (name, model, delete_status, delete_id) then distinguishes each deletion without altering existing business logic.

4. Handling Historical Duplicate Data

For tables that already contain duplicate rows, assign a new delete_id to each group of duplicates (e.g., the first occurrence gets delete_id = 1, later duplicates receive the original row’s primary key). After this data cleanup, a unique index on (name, model, delete_status, delete_id) can be safely added.

5. Unique Index on Large Columns

InnoDB limits index length to 3072 bytes, with a unique key maximum of 1000 bytes. When a column (e.g., model) exceeds this, the index cannot be created directly.

5.1 Add a Hash Column

Store a fixed‑length hash (e.g., MD5, SHA‑1) of the large column in a new field and create the unique index on the hash together with other columns. This avoids the length limitation, though hash collisions are possible.

5.2 Skip the Unique Index

If the index is impractical, enforce uniqueness through application logic, such as single‑threaded inserts, message‑queue serialization, or other coordination mechanisms.

5.3 Use Redis Distributed Locks

Generate a hash of the composite key ( name, model, delete_status, delete_id) and acquire a Redis lock on that hash before inserting. Even if a rare hash collision occurs, the probability is low in typical workloads.

6. Bulk Insert Scenarios

Applying a Redis lock per row in a large batch is inefficient. Instead, rely on MySQL’s unique index and perform a bulk INSERT. MySQL will automatically reject duplicate rows, allowing the batch to succeed for all unique records while reporting errors for duplicates.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

InnoDB mysql Batch Insert Logical Delete Unique Index Hash Field

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.