Databases 12 min read

6 Proven Strategies to Add Columns to Billion‑Row MySQL Tables Without Downtime

This article examines why adding a column to a MySQL table with tens of millions of rows can lock the table, then presents six practical solutions—including native online DDL, PT‑OSC, gh‑ost, dual‑write migration, maintenance downtime, and partition sliding‑window—detailing their procedures, trade‑offs, and suitable scenarios.

ITPUB
ITPUB
ITPUB
6 Proven Strategies to Add Columns to Billion‑Row MySQL Tables Without Downtime

Why adding a column to a large MySQL table is risky

MySQL DDL locks the table. Before MySQL 5.6 the lock blocks all reads and writes; from 5.6+ only a subset of operations support online DDL. The lock duration is roughly table size / disk I/O speed. For a 10 M‑row table with 1 KB rows on a 100 MB/s disk the table can be unavailable for about 100 seconds, which is unacceptable for high‑concurrency services.

Solution 1 – Native Online DDL (MySQL 5.6+)

MySQL can perform online DDL with ALGORITHM=INPLACE, LOCK=NONE. Example:

ALTER TABLE user ADD COLUMN age INT, ALGORITHM=INPLACE, LOCK=NONE;

Pros: low complexity, works well for tables < 100 M rows. Cons: may still acquire a table lock for operations such as adding a full‑text index, requires roughly double the disk space for the temporary copy, and can be affected by replication lag on slaves.

Solution 2 – Maintenance (downtime) method

When a maintenance window is acceptable, stop the service, dump the table, run the ALTER, and reload the data. Suitable for tables < 100 GB where the downtime (hours) can be tolerated and a full rollback plan exists.

Solution 3 – pt‑online‑schema‑change (PT‑OSC)

PT‑OSC creates a shadow table, copies data in chunks, and uses triggers on the original table to keep the shadow in sync. Install and run:

sudo yum install percona-toolkit
pt-online-schema-change \
  --alter "ADD COLUMN age INT" \
  D=test,t=user \
  --execute

Pros: cut‑over lock is only a few milliseconds. Cons: triggers add CPU load, require double space, and cannot be used on tables with foreign keys or existing triggers.

Solution 4 – Logical migration with dual‑write

This approach is common for financial‑grade core tables where zero data loss is required.

Step 1 – Create a new table

CREATE TABLE user_new (
  id BIGINT PRIMARY KEY,
  name VARCHAR(50),
  age INT DEFAULT 0,
  KEY idx_name(name)
) ENGINE=InnoDB;

Step 2 – Dual‑write in application (Java example)

public void addUser(User user) {
    userOldDAO.insert(user);
    userNewDAO.insert(convertToNew(user));
}
private UserNew convertToNew(User old) {
    UserNew u = new UserNew();
    u.setId(old.getId());
    u.setName(old.getName());
    u.setAge(getAgeFromCache(old.getId()));
    return u;
}

Step 3 – Batch data migration

SET @start_id = 0;
WHILE EXISTS(SELECT 1 FROM user WHERE id > @start_id) DO
  INSERT INTO user_new (id, name, age)
  SELECT id, name, COALESCE(age_cache,0)
  FROM user
  WHERE id > @start_id
  ORDER BY id
  LIMIT 10000;
  SET @start_id = (SELECT MAX(id) FROM user_new);
  COMMIT;
  SELECT SLEEP(0.1);
END WHILE;

Step 4 – Atomic cut‑over

RENAME TABLE user TO user_old, user_new TO user;
DROP TABLE user_old;

Pros: strong consistency, no lock time. Cons: high development effort, requires code changes and careful testing.

Solution 5 – gh‑ost (GitHub Online Schema Transmogrifier)

gh‑ost reads the binary log asynchronously, builds a shadow table, and applies incremental changes without using triggers. It is designed for TB‑scale tables and high‑concurrency environments.

gh-ost \
  --alter "ADD COLUMN age INT NOT NULL DEFAULT 0 COMMENT 'User age'" \
  --host=master_ip --port=3306 --user=gh_user --password=xxx \
  --database=test --table=user \
  --chunk-size=2000 \
  --max-load=Threads_running=80 \
  --critical-load=Threads_running=200 \
  --cut-over-lock-timeout-seconds=5 \
  --execute \
  --allow-on-master

Monitoring example: echo status | nc -U /tmp/gh-ost.sock Key parameters: --max-lag-millis pauses the copy when replication lag exceeds the threshold. --postpone-cut-over-flag-file enables manual cut‑over.

Cut‑over steps (milliseconds lock):

Lock source table briefly.

Rename tables atomically: RENAME TABLE source TO _source_del, _source_gho TO source.

Drop the old table ( _source_del).

Solution 6 – Partition sliding‑window

For time‑partitioned log tables, add the new column only to the newest partition and then reorganize partitions. Example:

ALTER TABLE logs ADD COLUMN log_level VARCHAR(10) DEFAULT 'INFO';
ALTER TABLE logs REORGANIZE PARTITION p202302 INTO (
  PARTITION p202302 VALUES LESS THAN (TO_DAYS('2023-03-01')),
  PARTITION p202303 VALUES LESS THAN (TO_DAYS('2023-04-01'))
);

This limits the DDL impact to the most recent partition, keeping older partitions untouched.

Practical comparison

Native Online DDL : seconds‑to‑minutes lock, medium business impact, strong consistency, best for < 100 M rows, low complexity.

Maintenance (downtime) : hours lock, high impact, strong consistency, suitable when downtime is allowed and size < 100 GB, medium complexity.

PT‑OSC : millisecond cut‑over, medium impact (trigger overhead), eventual consistency, works on tables without FK/triggers, medium complexity.

Logical migration + dual‑write : no lock, low impact (code change), strong consistency, required for > 1 B rows in financial core tables, high complexity.

gh‑ost : millisecond cut‑over, low impact (no triggers), eventual consistency, ideal for TB‑scale high‑concurrency tables, medium‑high complexity.

Partition sliding‑window : only new partition affected, low impact, partition‑level consistency, best for time‑partitioned log tables, medium complexity.

Key operational guidelines

Always take a full backup ( mysqldump + binlog) before any schema change.

Ensure the table has a primary key; otherwise the copy will trigger full table scans.

Reserve at least 1.5 × the current table size on disk for temporary copies.

Monitor replication lag (e.g., SHOW SLAVE STATUS) and keep Seconds_Behind_Master < 10 seconds.

Avoid adding NOT NULL columns without a default; prefer NULL or DEFAULT NULL to prevent full‑table rewrites.

Prefer ENUM over long VARCHAR for low‑cardinality fields.

For ultra‑large tables (hundreds of billions of rows) consider sharding instead of direct DDL.

MySQLLarge TablesOnline DDLDatabase OperationsSchema Migration
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.