Databases 16 min read

How to Scale a 900 M‑Row Message Table to 60 B Rows Without Downtime

This article details a comprehensive sharding strategy for a high‑traffic message table, explaining why early partitioning is essential, how to design language‑based vertical shards combined with hash‑based horizontal shards, calculate table counts and virtual nodes, and execute a zero‑downtime migration using dual‑writes and gradual traffic rollout.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Scale a 900 M‑Row Message Table to 60 B Rows Without Downtime

Background and Motivation

The original message table grew to 90 million rows, causing query latency to jump from ~50 ms to ~300 ms. Projections showed it would reach 6 billion rows within six months, which would exceed the performance limits of a single table and risk system crash. Because the business requires zero‑downtime and no data loss, the team decided to partition the data proactively.

Partitioning Strategy

The core business attribute is the language field (Traditional Chinese, English, Japanese), used in 80 % of queries together with userId and dialogId. The chosen two‑level sharding scheme is:

Vertical shard by language – each language has its own cluster (e.g., message_zh_hant, message_en, message_ja).

Horizontal hash shard within each language – the hash of userId_dialogId determines the physical table, keeping all messages of a conversation together.

Table Count Calculation

Target volume: 6 billion rows. Desired per‑table limit: 450 million rows (leaving ~50 million headroom). Language distribution: Chinese 60 %, English 30 %, Japanese 10 %.

Chinese: 3.6 billion rows → 8 tables (≈450 M each).

English: 1.8 billion rows → 4 tables.

Japanese: 600 million rows → 2 tables (expandable to 4).

Each physical table is assigned 100 virtual nodes to smooth hash distribution and limit data skew.

Virtual‑Node Design

Virtual nodes reduce skew (from 15 % to <3 % at 90 M rows, staying <5 % at 6 B rows) and add only microsecond‑level routing overhead. Adding new tables later only requires inserting their virtual nodes into the hash ring.

func calcVirtualNodeHash(tableName string, virtualIdx int) uint32 {
    key := fmt.Sprintf("%s_%d", tableName, virtualIdx)
    return crc32.ChecksumIEEE([]byte(key))
}

func routeMessage(msg Message) string {
    // 1. Find language cluster
    cluster := getClusterByLanguage(msg.Language)
    // 2. Compute hash of userId+dialogId
    msgHash := crc32.ChecksumIEEE([]byte(fmt.Sprintf("%s_%s", msg.UserId, msg.DialogId)))
    // 3. Locate nearest virtual node on the hash ring
    virtualNode := findNearestVirtualNode(cluster, msgHash)
    return getPhysicalTable(cluster, virtualNode)
}

Zero‑Downtime Migration Process

Phase 1 – Migrate Existing 90 M Rows

Dual‑write initialization (1 day) – a middleware writes new messages to both the old single table and the new sharded tables.

Historical data backfill – use Canal to read the MySQL binlog, split rows by language, and insert them into the appropriate shard.

Data validation (½ day) – verify row counts per language, random content checks, and compare performance of top‑10 queries.

Gray‑release read routing (≈1 hour) – gradually shift read traffic: 5 % → 30 % → 100 % to the sharded tables while monitoring for errors.

Stop dual‑write and clean up (≈1 week) – after three days of stable reads, disable dual‑write, set the old table to read‑only for 7 days, then delete rows in 10 M‑row batches.

Phase 2 – Future Expansion

Add new tables and update the hash ring – e.g., increase Chinese tables from 8 to 16 without restarting services.

Dual‑write with incremental migration (≈6 hours) – new writes go to both old and new shards; only the data range that maps to the new virtual nodes (≈2.25 B rows, 6.25 % of total) is migrated.

Gray‑release read routing and cleanup – same gradual traffic shift, then stop dual‑write and delete migrated rows from the old tables.

Operational Recommendations for Backend Engineers

Minimal necessary indexes – keep only the primary key (Snowflake ID) and a composite index idx_user_dialog_create (userId, dialogId, createTime) which covers >90 % of queries.

Pre‑add distributed locks – use Redis SETNX on messageId before writing to a shard to avoid duplicate inserts during dual‑write.

Three‑level monitoring – monitor at cluster, instance, and table granularity; set growth‑trend alerts to warn before reaching capacity limits.

Result

By combining language‑aware vertical sharding with hash‑based horizontal sharding, sizing tables for the projected 6 billion rows, and executing a disciplined dual‑write + gray‑release migration, the system maintained sub‑30 ms query latency after the table grew to 150 million rows. The architecture is prepared to handle the target 60 billion rows without downtime.

Performance optimizationshardingMySQLhorizontal partitioningvirtual nodeszero‑downtime migration
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.