Databases 20 min read

Adding Unique Indexes in MySQL: Methods, Risks, and Hook‑Based Mitigations

This article explains how to add unique indexes to large MySQL tables using Online DDL, pt‑osc, and gh‑ost, compares their advantages and drawbacks, outlines common data‑loss risks, and provides practical hook scripts and best‑practice recommendations to mitigate those risks.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Adding Unique Indexes in MySQL: Methods, Risks, and Hook‑Based Mitigations

In MySQL 5.6 and later, adding a unique index can be performed without rebuilding the table by using Online DDL, but for large tables third‑party tools such as pt‑online‑schema‑change (pt‑osc) and gh‑ost are often preferred.

Online DDL

Online DDL creates the index in‑place, does not block DML, and does not rebuild the table. The process consists of a full data scan (synchronizing all rows) followed by a synchronized incremental phase that replays row logs. Although it acquires an MDL lock during the Prepare and Commit phases, the lock is released before the Execute phase, making the impact on DML minimal.

pt‑osc

pt‑osc creates a new table with the same structure, copies data with INSERT IGNORE INTO , and then applies incremental changes via triggers. It does not validate duplicate values, so duplicate rows are either ignored (full copy) or overwritten (incremental copy). The tool lacks built‑in safeguards to prevent data loss when adding a unique index.

gh‑ost

gh‑ost follows a similar copy‑then‑swap approach but supports a hook mechanism that can run custom scripts before and after the copy phase. Like pt‑osc, it does not automatically reject duplicate values, but the hook can be used to verify uniqueness and abort the migration if necessary.

Comparison of Schemes

Scheme

Data Loss

Recommendation

Online DDL

No

Suitable for small tables or when replica lag is acceptable

pt‑osc

Possible

Not recommended for adding unique indexes

gh‑ost

Possible (mitigated with hooks)

Preferred for large tables when combined with uniqueness checks

Risks When Adding Unique Indexes

Three typical data‑loss scenarios are identified:

Adding a new column with a unique index while the column has a default value; duplicate rows may be collapsed.

The original table already contains duplicate values for the indexed column; the migration will drop the duplicates.

During the migration, new DML creates duplicate values that overwrite existing rows, especially when the cut‑over occurs.

Risk Mitigation Strategies

Prohibit mixing other schema changes with the unique‑index addition.

Use a hook to run a uniqueness‑check SQL before the migration and abort if duplicates exist.

When using gh‑ost, enable the hook to compare the estimated row count with the actual copied rows and stop the migration if they differ.

Hook Example (Basic)

# ./pt-online-schema-change --version
pt-online-schema-change 3.0.13
# ./bin/gh-ost --version
1.1.5

A simple hook script ( gh-ost-on-rowcount-complete-hook ) records the estimated row count, while gh-ost-on-row-copy-complete-hook compares it with the actual copied rows and aborts on mismatch.

Enhanced Hook Example

The enhanced script performs the following steps:

Loads configuration and logging utilities.

Executes a custom SQL that returns count(distinct …):count(*) for the columns to be indexed.

Parses the result and verifies that the distinct count matches the total row count, indicating true uniqueness.

Logs success or failure and exits with an appropriate status code.

#!/bin/bash
work_dir="/opt/soft/zzonlineddl"
. ${work_dir}/function/log/f_logging.sh
# load hook configuration
hook_conf="${work_dir}/hook/conf/--mysql_port--_${GH_OST_DATABASE_NAME}.${GH_OST_TABLE_NAME}"
. ${hook_conf}
function f_main(){
  count_info="$( ${mysql_comm} -NBe "${mysql_sql}" )"
  count_total="$(awk -F: '{print $NF}' <<< "${count_info}")"
  if [ -z "${count_total}" ]; then
    f_logging "$(date '+%F %T')":ERROR "Unique index field check failed, aborting"
    exit -1
  fi
  # compare distinct count with total rows
  if [ "${count_total}" -eq "${GH_OST_COPIED_ROWS}" ]; then
    f_logging "$(date '+%F %T')":INFO "Uniqueness verified, proceeding"
  else
    f_logging "$(date '+%F %T')":ERROR "Potential data loss detected, aborting"
    exit -1
  fi
  exit 0
}
f_main

The script is not a universal solution; extreme cases such as large‑table long‑running hooks or concurrent inserts followed by deletes can still cause data loss.

Testing with gh‑ost

Using the --test-on-replica flag runs the migration on a replica, stops replication, swaps tables, reverses the swap, and leaves both tables in sync for manual verification. However, this does not eliminate the time gap between cut‑over and verification, so duplicate inserts during that window remain a risk.

Final Recommendations

If replica lag is acceptable, prefer Online DDL for small tables.

When using third‑party tools, choose gh‑ost with hooks and always verify column uniqueness before migration.

Avoid dropping the old table immediately; keep it as a fallback (use --no-drop-old-table for pt‑osc and avoid --ok-to-drop-table for gh‑ost).

Consider the impact on write performance: unique indexes cannot use the InnoDB change buffer, so they may be slower than regular secondary indexes.

Overall, adding a unique index to a large MySQL table is feasible but requires careful planning, risk assessment, and appropriate tooling to prevent data loss.

MySQLonline DDLdatabase migrationgh-ostpt-oscunique index
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.