How Sina’s News Comment System Scaled to Millions of Users: Lessons from 3.0 to 5.0
This article chronicles the evolution of Sina's news comment platform from its early Perl‑based prototype through versions 3.0, 4.0, and 5.0, detailing architectural choices, caching strategies, database sharding, asynchronous processing, and the eventual migration to cloud‑native Python services to handle massive traffic spikes.
Origin of the News Comment System
Sina introduced comment functionality as early as April 7, 2000 using a simple Perl script; the feature became a standard component of all Chinese news portals.
Comment System 3.0
Around 2003 the system ran on a single Solaris server (Dell 6300) with MySQL, Apache, and C++ CGI programs.
Page‑level caching stored two pages per file with one overlapping page, allowing a single append operation for new comments.
The system used a single MyISAM table per channel, leading to severe read/write contention and frequent crashes under hot news spikes.
Temporary mitigation involved adding a low‑spec FreeBSD server for database off‑loading and nightly data pruning.
Comment System 4.0 Launch
In 2004 the goal was “no downtime”. The design kept the existing database schema, introduced a file‑system index layer, and used a custom file‑based message queue for asynchronous processing.
Phase 1: File System Replaces Database, ICE‑Based Distributed System
Each news article’s comments were stored in separate index and data files, isolating user‑facing operations from MySQL.
ICE was chosen as the RPC component for inter‑module communication.
Phase 2: Full System Asynchrony and Index Pagination Optimization
The front‑end switched from Apache + CGI to static HTML with AJAX‑loaded XML, enabling passive cache updates without blocking user requests.
For massive comment volumes (tens of millions), the original full‑sort indexing became impractical; a hybrid precise‑plus‑fuzzy pagination algorithm was adopted.
Phase 3: Simplified Cache Strategy and Further I/O Reduction
Server count grew to double digits, Memcached replaced file‑based comment storage, and only recent weeks of data remained on disk.
High‑Traffic UGC System Design Summary
The core principles are to relax real‑time consistency of secondary paths and boost performance through three pillars: Queue, Cache, and Sharding.
Queue: buffers write spikes and enables asynchronous processing.
Cache: multi‑level caching from file system to memory reduces latency.
Sharding: partitions data to keep hot sets small.
Database design should separate hot (recent) from cold (historical) data, and keep index and entity data distinct.
Most large‑scale web services still rely on MySQL replication; its single‑threaded write bottleneck remains a challenge.
Source: http://www.csdn.net/article/2014-12-17/2823183
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
