How We Evolved a News App Comment System: From Threaded Views to AI‑Driven Ranking
This article details the evolution of a news‑app comment backend, covering early thread‑based displays, the transition to sharded databases and mixed adjacency‑path models, current hot‑comment ranking strategies, an in‑house experiment platform, topic aggregation via Kafka, and future AI‑driven architectural enhancements.
Yesterday: Early Threaded Comment Display
The initial comment flow was a simple publish‑then‑display pipeline, with comments shown in a hierarchical "floor" layout where each reply referenced its parent comment.
To implement this threading, several data‑model options were evaluated:
Adjacency List : each comment stores its parent ID – easy to query a parent but requires recursive queries for deep nesting, which can degrade performance.
Path Enumeration : each comment stores the full path from the root (e.g., 1‑2‑3) – simplifies upward queries but may hit performance limits at high floor counts and often needs a floor‑limit.
Closure Table : an extra table records every ancestor‑descendant pair – offers flexible relationship queries at the cost of extra storage that grows with nesting depth.
Document‑oriented DB : child comments are embedded as arrays inside a document – works well for read‑heavy, write‑light scenarios but limits nesting depth and document size.
Hybrid Mode : combines the strengths of the above approaches.
We adopted a hybrid of the adjacency list and path enumeration to support both threaded and flat list displays while keeping the design extensible.
Today: Current Backend Architecture
After several releases, a single Comment table and a single Reply table could no longer meet performance requirements, prompting a move to sharding.
Sharding Process :
First split used a dual‑write approach to verify data consistency.
Gradually introduced read‑only traffic to the new shards.
Eventually switched to write‑only on the old shard and read‑only on the new one.
When the single‑shard capacity approached its limit in 2024, we added a database‑level split. Instead of complex sharding frameworks, we pre‑estimated the maximum article ID for a release day; comments for articles with IDs above that threshold were written to a new database, while older IDs remained in the original shard.
Because the existing comment ID generator was a strictly increasing distributed ID, we also built a new ID service that preserves ordering across shards, ensuring existing sorting logic continues to work.
Hot‑Comment Strategies (for the new flat‑list UI with "Hot" and "Latest" tabs):
Strategy 1 – Pure Like Count : sort by like count descending, breaking ties with publish time.
Strategy 2 – Recommendation‑Style : retrieve comments from six dimensions (likes, replies, user weight, etc.), cache each dimension, filter already‑seen items, and send the remaining IDs to an external ranking service. If the result set is too small, a second‑stage cache fills the gap.
Strategy 3 – Algorithmic Score : compute a score for each comment at publish time, on interaction, when a weight is set, or when marked as a "highlighted" comment. Scores are queued, processed by the ranking service, and returned instantly, allowing the ranking logic to evolve without redeploying the comment service.
We built an internal SNS comment experiment platform to evaluate these strategies. The platform supports multi‑bucket experiments, guarantees performance, and is extensible for future tests.
Topic Aggregation :
Comments from related news items are aggregated into a unified discussion. Updates to a source comment are broadcast via a Kafka message to all topic‑specific comment stores. A relationship table maps source comment IDs to the generated "pseudo‑comment" IDs in each topic, enabling consistent updates across all aggregations.
Tomorrow: Future AI‑Driven Enhancements
We envision a comment backend that is data‑driven and dynamically adaptive, leveraging AI in several dimensions:
Compliance Automation : multimodal AI models that fuse image, audio, and text signals to detect malicious content.
Sentiment‑Based Ranking : AI evaluates comment sentiment and discussion quality, promoting high‑value, constructive comments.
Autonomous DB Operations : AI‑guided automatic slow‑query optimization, index recommendation, data rebalancing, and connection‑pool tuning.
Real‑Time Insight & Decision Support : AI generates sentiment dashboards, topic heatmaps, controversy detection, predictive metrics, and automated reporting.
AI‑Assisted Development : code generation for comment‑related SQL, AI‑driven query caching pre‑heat, and dynamic sharding decisions.
These capabilities aim to transform the comment service from a static storage‑and‑query component into an intelligent organism that understands content semantics, predicts user behavior, optimizes resources proactively, and continuously learns from interaction data.
Key architectural diagrams illustrating the evolution are shown below:
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
