How Vivo Built a Scalable Comment Platform with MongoDB Sharding
This article explains how Vivo designed a company‑wide comment middle‑platform, chose MySQL and MongoDB, deep‑dived into MongoDB cluster architecture, shard key strategies, and practical solutions for scaling, migration, and high availability in a high‑traffic environment.
Business Background
As Vivo’s user base grew, many projects independently built comment features, leading to duplicated effort and data silos. To provide a unified, quickly integrable comment service, a company‑level comment middle‑platform was created, supporting core functions such as comments, replies, nested replies, and likes.
Database Storage Choice
The team evaluated several mainstream databases and settled on a combination of MySQL and MongoDB. The comment domain required dynamic field extension, massive data growth, and high availability, while transactional guarantees were not critical. Consequently, a MongoDB cluster was selected as the primary storage engine.
Deep Dive into MongoDB
1. Cluster Architecture
MongoDB’s cluster consists of three components:
mongos : routing servers that forward client read/write requests to the appropriate shard.
config : configuration servers storing metadata for sharded collections; they must run as a replica set.
shard : replica‑set mongod instances that hold the actual data chunks.
2. Shard Key
Data in a sharded collection is divided into chunks based on a shard key. Two main shard‑key types are supported:
Hash sharding : distributes data evenly using a hash algorithm; works for single‑ or multi‑field keys.
Range sharding : groups similar key values together, ideal for range queries.
Practical Implementation in the Comment Platform
1) Cluster Expansion
Each business client gets its own isolated comment table (e.g., comment_clientA, comment_clientB). This approach initially caused three problems:
Physical data isolation across a single cluster was impossible.
Cluster tuning (e.g., split‑migration timing) could not be customized per business.
Horizontal scaling dispersed a single client’s data across many shards.
To address these, the MongoDB architecture was extended with logical and physical clusters: a logical cluster per client, multiple logical clusters within a physical cluster, and a routing layer built on Spring’s MongoTemplate and connection‑pool management.
2) Shard‑Key Selection
To keep comment list queries confined to a single shard, range sharding on topicId was chosen. However, performance tests revealed two critical issues:
Jumbo chunk : when a chunk exceeds the configured size (1 MiB–1024 MiB), it cannot be split, leading to uneven data distribution.
Unique‑key limitation : a sharded collection’s unique index must include the shard key; otherwise _id uniqueness is only guaranteed per shard.
Example of the jumbo‑chunk problem: with a 1024 MiB chunk size and 5 KB documents, a single chunk holds ~210 k documents; hot topics can exceed 400 k comments, causing the chunk to grow beyond the limit and remain unsplit, which eventually unbalances the cluster.
You cannot specify a unique constraint on a hashed index
For a to-be-sharded collection, you cannot shard the collection if the collection has other unique indexes
For an already-sharded collection, you cannot create unique indexes on other fieldsSolution: delete the original collection, create a compound shard key of {topicId, _id}, and rebuild the collection, thereby breaking the size limit and satisfying uniqueness.
3) Migration and Scaling
When a chunk exceeds its size threshold, MongoDB automatically splits it. The balancer process then migrates chunks to maintain even distribution. Migration impacts cluster load, so it is recommended to run during low‑traffic periods.
To add capacity, provision a new shard replica set and execute the following on a mongos node (command omitted for brevity). During scaling, chunk migration may temporarily reduce availability, so schedule during off‑peak hours.
Conclusion
The MongoDB‑based comment platform has been in production for over a year, supporting more than ten business clients and storing over 100 million comment and reply records with stable performance. While MongoDB clusters provide horizontal scalability, they also introduce constraints such as index and sharding rules; for many workloads, a replica‑set deployment may suffice for terabyte‑scale storage.
All observations are based on MongoDB 4.0.9; newer versions may have slight differences.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
