6 Proven Strategies to Add Columns to Billion‑Row MySQL Tables Without Downtime
This article examines why adding a column to a MySQL table with tens of millions of rows can lock the table, then presents six practical solutions—including native online DDL, PT‑OSC, gh‑ost, dual‑write migration, maintenance downtime, and partition sliding‑window—detailing their procedures, trade‑offs, and suitable scenarios.
Why adding a column to a large MySQL table is risky
MySQL DDL locks the table. Before MySQL 5.6 the lock blocks all reads and writes; from 5.6+ only a subset of operations support online DDL. The lock duration is roughly table size / disk I/O speed. For a 10 M‑row table with 1 KB rows on a 100 MB/s disk the table can be unavailable for about 100 seconds, which is unacceptable for high‑concurrency services.
Solution 1 – Native Online DDL (MySQL 5.6+)
MySQL can perform online DDL with ALGORITHM=INPLACE, LOCK=NONE. Example:
ALTER TABLE user ADD COLUMN age INT, ALGORITHM=INPLACE, LOCK=NONE;Pros: low complexity, works well for tables < 100 M rows. Cons: may still acquire a table lock for operations such as adding a full‑text index, requires roughly double the disk space for the temporary copy, and can be affected by replication lag on slaves.
Solution 2 – Maintenance (downtime) method
When a maintenance window is acceptable, stop the service, dump the table, run the ALTER, and reload the data. Suitable for tables < 100 GB where the downtime (hours) can be tolerated and a full rollback plan exists.
Solution 3 – pt‑online‑schema‑change (PT‑OSC)
PT‑OSC creates a shadow table, copies data in chunks, and uses triggers on the original table to keep the shadow in sync. Install and run:
sudo yum install percona-toolkit
pt-online-schema-change \
--alter "ADD COLUMN age INT" \
D=test,t=user \
--executePros: cut‑over lock is only a few milliseconds. Cons: triggers add CPU load, require double space, and cannot be used on tables with foreign keys or existing triggers.
Solution 4 – Logical migration with dual‑write
This approach is common for financial‑grade core tables where zero data loss is required.
Step 1 – Create a new table
CREATE TABLE user_new (
id BIGINT PRIMARY KEY,
name VARCHAR(50),
age INT DEFAULT 0,
KEY idx_name(name)
) ENGINE=InnoDB;Step 2 – Dual‑write in application (Java example)
public void addUser(User user) {
userOldDAO.insert(user);
userNewDAO.insert(convertToNew(user));
}
private UserNew convertToNew(User old) {
UserNew u = new UserNew();
u.setId(old.getId());
u.setName(old.getName());
u.setAge(getAgeFromCache(old.getId()));
return u;
}Step 3 – Batch data migration
SET @start_id = 0;
WHILE EXISTS(SELECT 1 FROM user WHERE id > @start_id) DO
INSERT INTO user_new (id, name, age)
SELECT id, name, COALESCE(age_cache,0)
FROM user
WHERE id > @start_id
ORDER BY id
LIMIT 10000;
SET @start_id = (SELECT MAX(id) FROM user_new);
COMMIT;
SELECT SLEEP(0.1);
END WHILE;Step 4 – Atomic cut‑over
RENAME TABLE user TO user_old, user_new TO user;
DROP TABLE user_old;Pros: strong consistency, no lock time. Cons: high development effort, requires code changes and careful testing.
Solution 5 – gh‑ost (GitHub Online Schema Transmogrifier)
gh‑ost reads the binary log asynchronously, builds a shadow table, and applies incremental changes without using triggers. It is designed for TB‑scale tables and high‑concurrency environments.
gh-ost \
--alter "ADD COLUMN age INT NOT NULL DEFAULT 0 COMMENT 'User age'" \
--host=master_ip --port=3306 --user=gh_user --password=xxx \
--database=test --table=user \
--chunk-size=2000 \
--max-load=Threads_running=80 \
--critical-load=Threads_running=200 \
--cut-over-lock-timeout-seconds=5 \
--execute \
--allow-on-masterMonitoring example: echo status | nc -U /tmp/gh-ost.sock Key parameters: --max-lag-millis pauses the copy when replication lag exceeds the threshold. --postpone-cut-over-flag-file enables manual cut‑over.
Cut‑over steps (milliseconds lock):
Lock source table briefly.
Rename tables atomically: RENAME TABLE source TO _source_del, _source_gho TO source.
Drop the old table ( _source_del).
Solution 6 – Partition sliding‑window
For time‑partitioned log tables, add the new column only to the newest partition and then reorganize partitions. Example:
ALTER TABLE logs ADD COLUMN log_level VARCHAR(10) DEFAULT 'INFO';
ALTER TABLE logs REORGANIZE PARTITION p202302 INTO (
PARTITION p202302 VALUES LESS THAN (TO_DAYS('2023-03-01')),
PARTITION p202303 VALUES LESS THAN (TO_DAYS('2023-04-01'))
);This limits the DDL impact to the most recent partition, keeping older partitions untouched.
Practical comparison
Native Online DDL : seconds‑to‑minutes lock, medium business impact, strong consistency, best for < 100 M rows, low complexity.
Maintenance (downtime) : hours lock, high impact, strong consistency, suitable when downtime is allowed and size < 100 GB, medium complexity.
PT‑OSC : millisecond cut‑over, medium impact (trigger overhead), eventual consistency, works on tables without FK/triggers, medium complexity.
Logical migration + dual‑write : no lock, low impact (code change), strong consistency, required for > 1 B rows in financial core tables, high complexity.
gh‑ost : millisecond cut‑over, low impact (no triggers), eventual consistency, ideal for TB‑scale high‑concurrency tables, medium‑high complexity.
Partition sliding‑window : only new partition affected, low impact, partition‑level consistency, best for time‑partitioned log tables, medium complexity.
Key operational guidelines
Always take a full backup ( mysqldump + binlog) before any schema change.
Ensure the table has a primary key; otherwise the copy will trigger full table scans.
Reserve at least 1.5 × the current table size on disk for temporary copies.
Monitor replication lag (e.g., SHOW SLAVE STATUS) and keep Seconds_Behind_Master < 10 seconds.
Avoid adding NOT NULL columns without a default; prefer NULL or DEFAULT NULL to prevent full‑table rewrites.
Prefer ENUM over long VARCHAR for low‑cardinality fields.
For ultra‑large tables (hundreds of billions of rows) consider sharding instead of direct DDL.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
