How to Build a Million‑User Real‑Time High‑Availability Comment System
This article explains how to design a highly available comment system that can handle millions of concurrent users by analyzing comment fundamentals, traffic patterns, storage choices, caching layers, sharding strategies, architectural evolution from monolith to distributed micro‑services, and fault‑tolerance mechanisms.
Author 牛哥 introduces the challenge of building a comment system that can sustain real‑time interaction for millions of users.
Understanding Comments
Three core questions guide the design: why users comment, what product problems comments solve, and which technical pressures the system must endure.
For users, comments are an emotional outlet and decision‑making tool; for the product they act as a content moat; for technology they become a traffic amplifier.
Traffic Breakdown
Typical read‑write ratio is about 20:1, making read performance the primary bottleneck.
Storage Tool Selection
A comparison of three storage schemes is shown in the table below.
方案
适用场景
优点
缺点
纯MySQL(关系型)
需要事务、评论嵌套层级≤3
支持ACID、数据可靠;索引成熟、运维熟悉
单表超千万后分页慢;分库分表逻辑复杂
纯MongoDB(文档型)
非结构化内容多、评论嵌套深
原生存JSON嵌套、不用拆表;写入比MySQL快30%
深嵌套查询慢;事务支持弱
MySQL+Redis(混合)
多数评论业务
MySQL存全量保可靠;Redis存热点扛读压
双写易不一致;系统复杂度高
Why MySQL+Redis Hybrid
The hybrid approach combines MySQL’s reliability with Redis’s high‑read performance.
MySQL for full data – provides ACID guarantees and stable storage.
Redis for hot data – ZSet stores comment lists, Hash stores comment details, String stores like counts.
Caching Strategy
Local Caffeine cache handles ultra‑hot data (5 minutes, ≥1000 reads) with microsecond latency; Redis cluster handles hot data with millisecond latency. Randomized TTL (+‑10%) prevents cache avalanche.
Sharding Strategies
Three common sharding methods are presented:
Content‑ID hash – tableIndex = contentId % 32 (32 tables).
Time + ID composite – tableIndex = (contentId%8) + (month%4) (32 tables).
Hotspot separate tables – monitor QPS, migrate hot content to dedicated tables.
Architecture Evolution
Bronze (single monolith)
Single application + MySQL + Caffeine; suitable for early stage (<100k DAU) but limited by CPU, pagination, and single‑point failures.
Silver (vertical split)
Separate comment publishing service and interaction service; MySQL read‑write separation; Redis cache for likes.
Gold (distributed)
Layered defense: LVS + Nginx load balancing, custom Spring Cloud Gateway (auth, rate‑limit, monitoring), microservices for publishing, querying, interaction, and content‑review, Kafka for async pipelines, Elasticsearch for full‑text search.
Fault Recovery
Cache‑avalanche protection with local cache and random TTL; MySQL uses 1‑master‑2‑slave MGR cluster with automatic failover; Redis uses 3‑master‑3‑slave cluster with Sentinel; cross‑region active‑active replication (Beijing & Shanghai) for disaster recovery; multi‑level degradation (rate‑limit, disable image upload, limit comment list) to keep core functionality alive.
Conclusion
Key takeaways: understand business needs before choosing technology, accept imperfect but reliable solutions, and prepare thorough pre‑plans to handle traffic spikes without user impact.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
NiuNiu MaTe
Joined Tencent (nicknamed "Goose Factory") through campus recruitment at a second‑tier university. Career path: Tencent → foreign firm → ByteDance → Tencent. Started as an interviewer at the foreign firm and hopes to help others.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
