How QQ Music Scaled Its Comment System for Celebrity Live Events
This article details the architectural redesign of QQ Music's comment platform—migrating to MongoDB, introducing threaded comments, and employing caching and message‑queue decoupling—to handle massive read/write spikes during celebrity live‑drop events while maintaining high availability and performance.
Background
Since its launch, QQ Music has operated several versions of its comment service. In 2019 the system was completely rebuilt using a tlist store that displayed comments in chronological order. To improve user experience, the product shifted to a threaded ("cover‑floor") comment model, prompting a migration of storage to MongoDB.
Challenges
Commenting is a critical social feature, especially during celebrity "airdrop" events that generate sudden traffic surges. The system must sustain high read pressure (comment lists, counts) and write pressure (posting, pinning artist comments) under these spikes.
Design Overview
Directly reading from MongoDB would require prohibitive storage costs. For hot keys, a conventional cache layer is used with strict anti‑penetration and rate‑limiting controls to avoid cache avalanche. Writes are decoupled via a high‑speed cache and a message queue; data is asynchronously persisted to MongoDB, allowing retries to guarantee eventual consistency for core data.
Two consistency strategies are employed: strong consistency via transactions for critical paths (lower throughput) and eventual consistency for most write scenarios to maximize throughput.
Read‑Side Optimizations
Increase MongoDB CPU cores and consumer concurrency.
Parallelize read services and shard cache keys (e.g., using uin%10) to prevent hot‑key concentration.
Separate read and write deployments to avoid interference.
Apply non‑critical request throttling to protect core paths.
Split global and hot‑comment message queues so that hot‑queue backlogs only affect a limited subset of traffic.
Write‑Side Optimizations
Prioritize core artist‑experience writes with a priority queue.
Break down write logic to focus on fast MongoDB ingestion and reduce message backlog.
Build operational tools for real‑time monitoring and rapid response to business or operational requests.
Performance Testing and Recent Issues
Regular read/write load tests identify bottlenecks, enabling rapid horizontal scaling when needed. During a major artist event, the system remained stable but encountered a consumption bottleneck and a storage‑cool‑down issue with a TSSD tier, which caused platform‑wide alerts despite emergency rate‑limiting.
Latest Optimizations
Read side :
Migrate comment and like counts from CKV to CKV+ without cooling to improve availability.
Add local caches with versioning for comment counts to sustain high throughput.
Implement front‑end safeguards that suppress error dialogs when comment data fails to load, reducing perceived failures.
Enhance front‑end page load speed to improve overall user experience.
Write side :
Separate write logic to keep the critical path fast and reduce message accumulation.
Introduce a priority queue to guarantee uninterrupted artist‑centric interactions.
Develop tooling for quick data inspection and operational adjustments.
Conclusion
Through a series of architectural refinements—storage migration, cache sharding, message‑queue decoupling, and targeted read/write optimizations—the QQ Music comment system now handles large‑scale celebrity events with improved stability and user experience, though occasional bottlenecks still require ongoing monitoring and tuning.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
