Databases 16 min read

How Vivo Scaled to Billions of Records: Sharding and InnoDB Compression Strategies

This article details how Vivo's cloud service tackled explosive data growth by applying horizontal and vertical sharding, routing‑table based dynamic expansion, and MySQL InnoDB compression, providing step‑by‑step guidance, performance results, and practical recommendations for large‑scale database deployments.

ITPUB
ITPUB
ITPUB
How Vivo Scaled to Billions of Records: Sharding and InnoDB Compression Strategies

Background

Vivo cloud service backs up contacts, messages, notes, and bookmarks using MySQL. Rapid user growth caused data volume to jump from hundreds of millions to billions, creating severe storage challenges.

Challenges

From 2017‑2018 the service enabled default data sync, causing user numbers to surge from millions to tens of millions and data from hundred‑billion to trillion‑level scale. To handle massive data, four "axes" of sharding were applied: horizontal partitioning, vertical partitioning, horizontal sharding, and vertical sharding.

1. Horizontal Partitioning

Browser bookmarks and notes tables exceeded 100 million rows. The team split each single table into 100 tables, each holding about 10 million rows.

2. Horizontal Sharding

Contacts and SMS tables were initially 50 tables in a single database. After growth, a single database held tens of billions of contacts and tables with 50 million rows each. The team split the database into 10 databases and expanded tables to 100, migrating billions of rows.

3. Vertical Sharding & Partitioning

Analysis showed a single 5 TB database where contacts occupied 2.75 TB (55 %), SMS 1 TB (20 %), other modules 0.5 TB (5 %). Overall, contacts and SMS consumed 75 % of space, leaving insufficient room for growth. The solution was to separate each module into its own database.

4. Dynamic Expansion via Routing Table

After horizontal sharding, a single database reached 65 % capacity in nine months. To avoid costly full‑data migrations, a routing table was introduced to map each user’s data to a specific database/table. New users are routed to newly added databases, while old users stay in the original database.

Add a user routing table recording the target database and table.

New users’ data go to newly expanded databases, not stressing old ones.

Existing users’ data remain unchanged.

Only the old databases need to accommodate growth of existing users.

Compression Scheme Research

Three options were evaluated:

Application‑level compression before storing – flexible but requires heavy batch jobs and makes data unreadable.

MySQL InnoDB built‑in compression – simple DBA change, controllable speed, suitable for read‑heavy workloads.

TokuDB engine compression – powerful but requires plugin installation and unfamiliarity.

The team chose InnoDB compression for its simplicity and low migration cost.

Operation: DBA changes table file format to Barracuda.

Compression speed: 20 M rows compressed in 1–2 days.

Low refactor cost: only SQL changes, no application code changes.

Fits cloud service scenario: large string fields, backup/restore workloads.

Compression Implementation

Configuration changes:

SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_format_max=Barracuda;
SET GLOBAL innodb_file_per_table=1;

These settings enable per‑table compression; they must be made permanent in the MySQL configuration for production.

Tests with a compressed and an uncompressed table (100 k rows each) showed a 50 % space reduction (10 MB vs 20 MB). String columns compress well; binary data does not.

Online Practice

Compression reduced contact database usage from 65 % to 33 %, meeting the 60 % free‑space target. Performance testing showed CPU increase from 33 % to 43 % during heavy inserts, while TPS remained stable. DBA verified that read/write latency was unaffected in both offline and online tests.

Online rollout steps:

Compress a single table offline to estimate time.

Scale concurrent compression per database, keeping CPU below 55 %.

Calculate total time for all tables and schedule accordingly.

Conclusion

The article outlines how Vivo’s cloud service tackled massive data growth through sharding, vertical separation, routing‑table‑based expansion, and InnoDB compression, providing practical guidance for large‑scale MySQL deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingInnoDBmysqldata compressiondatabase scalingPartitioning
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.