Databases 14 min read

Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them

This article explores a common MySQL InnoDB pitfall where unique indexes fail to prevent duplicate rows, explains how NULL values and logical deletion affect uniqueness, and presents practical strategies—including composite keys, delete status counters, timestamps, hash fields, and Redis locks—to enforce true uniqueness.

macrozheng

Aug 16, 2022

Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them

Introduction

Recently I encountered a pitfall: a MySQL 8 InnoDB table with a unique index still allowed duplicate data. This article walks through that experience, explains why the unique index failed, and shares useful knowledge about unique indexes.

1. Reproducing the Issue

To prevent duplicate product groups, I created a "prevent‑duplicate" table and added a unique index on category_id, unit_id and model_hash. The table definition is:

CREATE TABLE `product_group_unique` (
  `id` bigint NOT NULL,
  `category_id` bigint NOT NULL,
  `unit_id` bigint NOT NULL,
  `model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `in_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Adding the unique index:

ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);

When model_hash is non‑NULL, duplicates are blocked. However, after inserting rows where model_hash is NULL, MySQL allowed duplicate records.

2. Unique Index Fields Containing NULL

If a column participating in a unique index contains NULL, MySQL treats each NULL as distinct, so the uniqueness constraint does not apply. This explains why inserting multiple rows with model_hash = NULL succeeded.

When a unique index column can be NULL , the uniqueness constraint may become ineffective.

3. Unique Index on Logically Deleted Tables

Logical deletion (using an update to set a delete_status flag) keeps the row in the table, which makes adding a unique index problematic because the deleted rows still occupy the unique key space.

Typical delete operation: DELETE FROM product WHERE id = 123; Logical delete operation:

UPDATE product SET delete_status = 1, edit_time = NOW() WHERE id = 123;

Because the row remains, a unique index on name and model prevents re‑inserting the same product even after it has been logically deleted.

3.1 Incremental Delete Status

Instead of a binary flag, use an ever‑increasing delete_status value (1, 2, 3, …). Each deletion increments the status, ensuring the composite unique key stays unique.

3.2 Add a Timestamp Field

Introduce a timestamp column and include it in the unique index (e.g., name, model, delete_status, timestamp). The timestamp changes on each logical delete, preserving uniqueness.

3.3 Add an ID Field

Add a separate delete_id column. When a row is logically deleted, set delete_id to the row’s primary key. The unique index on name, model, delete_status, delete_id then remains valid without altering existing delete logic.

4. Adding a Unique Index to Tables with Historical Duplicate Data

If a table already contains duplicate historical rows, first assign a distinguishing delete_id (or similar) to each row, then create the composite unique index on the chosen columns plus the new identifier.

5. Unique Index on Large Columns

MySQL InnoDB limits index length to 3072 bytes (1000 bytes for a unique key). When a column (e.g., model) exceeds this, the unique index cannot be created directly.

5.1 Add a Hash Column

Store a short hash (e.g., 16‑ or 32‑character) of the large column in a separate field and build the unique index on the hash together with other columns. This avoids the length limitation, though hash collisions must be considered.

5.2 Omit the Unique Index

Rely on application‑level mechanisms (single‑threaded jobs, MQ consumers, etc.) to guarantee uniqueness when an index is impractical.

5.3 Use Redis Distributed Locks

Generate a hash from the combination of fields and acquire a Redis lock on that hash before inserting. Even if a collision occurs, the probability is low under normal concurrency.

6. Bulk Insertion Scenarios

When inserting many rows, relying on a Redis lock per row degrades performance. Instead, let MySQL’s unique index handle duplicates: a single bulk INSERT will fail for duplicate rows, allowing the database to enforce uniqueness efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL Database Design Logical Delete Unique Index NULL handling

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.