Databases 13 min read

Understanding MySQL Unique Index Pitfalls and Practical Solutions

This article examines why a MySQL InnoDB table with a unique index can still store duplicate rows—especially when indexed columns contain NULL values or when logical deletion is used—and presents several practical strategies such as adjusting delete status, adding timestamp or auxiliary ID fields, using hash columns, and leveraging proper bulk‑insert techniques to enforce uniqueness effectively.

Selected Java Interview Questions

Nov 1, 2022

Understanding MySQL Unique Index Pitfalls and Practical Solutions

1. Reproducing the Issue

To prevent duplicate product groups, a table product_group_unique was created with a unique index on (category_id, unit_id, model_hash). Despite the index, duplicate rows appeared the next day.

CREATE TABLE `product_group_unique` (
  `id` bigint NOT NULL,
  `category_id` bigint NOT NULL,
  `unit_id` bigint NOT NULL,
  `model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `in_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);

After inserting data, the second and third rows were identical, demonstrating that the unique constraint failed.

2. Unique Index Columns Containing NULL

If model_hash is NULL, the uniqueness check is bypassed. Inserting two rows where model_hash is NULL succeeds, even though the other indexed columns match.

Unique indexes do not enforce uniqueness when any indexed column is NULL.

3. Unique Index on Logically Deleted Tables

Logical deletion is implemented by an update that sets a delete_status flag instead of physically removing rows. Adding a unique index on columns that include delete_status can cause conflicts because deleted rows still occupy the indexed values.

3.1 Incremental Delete Status

Instead of a binary flag, increment delete_status each time a row is deleted (1, 2, 3, …). This ensures each deletion yields a distinct value, preserving uniqueness without altering existing schema.

3.2 Adding a Timestamp Field

Introduce a timestamp column and include it in the unique index (e.g., (name, model, delete_status, timestamp)). The timestamp changes on each logical delete, guaranteeing uniqueness. Use second‑level precision, or millisecond precision for high‑concurrency scenarios.

3.3 Adding an Auxiliary ID Field

Add a dedicated delete_id column. When a row is logically deleted, set delete_id to the row’s primary key value. The unique index then covers (name, model, delete_status, delete_id), allowing repeated logical deletions without conflict.

4. Adding a Unique Index When Historical Duplicate Data Exists

Before creating the unique index, assign a unique delete_id to each existing duplicate group (e.g., keep the max id as delete_id=1 and set other rows' delete_id to their own id). After this data cleanup, the composite unique index can be added safely.

5. Unique Index on Large Columns

When a column (e.g., model) is too large for the MySQL unique‑index length limit (1000 bytes), consider the following approaches:

5.1 Add a Hash Column

Create a short hash (e.g., MD5, SHA‑1) of the large column and index the hash together with other fields. This reduces index size but introduces a small risk of hash collisions, which can be mitigated by also indexing another distinguishing column.

5.2 Omit the Unique Index

Rely on application‑level guarantees (single‑threaded jobs, MQ consumers processing sequentially) to prevent duplicates when an index is impractical.

5.3 Use Redis Distributed Locks

Generate a hash of the composite key ( name, model, delete_status, delete_id) and acquire a Redis lock on that hash before inserting. This approach works as a fallback when indexing is not feasible.

6. Bulk Insertion Scenarios

For batch inserts, using MySQL’s unique index is far more efficient than acquiring a Redis lock per row. A single INSERT ... ON DUPLICATE KEY IGNORE statement lets the database enforce uniqueness automatically.

Overall, understanding how NULL values, logical deletion, and large column sizes interact with MySQL unique indexes enables developers to design robust data‑integrity solutions without sacrificing performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL Database Design Hash Logical Delete Unique Index NULL handling

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.