Designing a Scalable Comment Threading System: Efficient Queries & Redis Caching
This article explains multiple database schema designs for hierarchical comment threads, compares their query performance, and shows how Redis can be used to sort hot comments and handle high‑concurrency likes, providing practical solutions for large‑scale comment systems.
ParentId based storage design
Adding a parentId column to the comment table creates a direct link from each comment to its immediate parent. A comment with id=2 and parentId=1 is therefore a reply to the comment id=1. To obtain all replies under a given comment the application must walk the tree recursively, issuing one SQL query per level. In deep threads (e.g., 1,000 levels) this results in 1,000 separate queries, quickly exhausting the database connection pool under high concurrency.
Path based storage design
To avoid recursive queries, a path column stores the full ancestry chain of a comment, e.g. 1/2/3. With this denormalized path a single SELECT can retrieve every descendant of a root comment. The trade‑off is that deleting a middle‑level comment requires updating the path of all its descendants, causing massive write amplification in high‑traffic scenarios.
Redundant rootId design
Many large platforms adopt a denormalized schema that stores the top‑level comment identifier ( rootId) together with each row. The table remains flat, enabling a single query filtered by rootId to fetch the entire thread. The application then assembles the hierarchical view in memory.
Example records: id=1, rootId=0 (root comment) id=2, rootId=1, replyId=1 (reply to comment 1) id=3, rootId=1, replyId=2 (reply to comment 2) id=4, rootId=1, replyId=2 (another reply to comment 2)
Fetching all comments belonging to the thread rooted at id=1 is a single SELECT on rootId=1. The result set is flat; the client code builds the tree structure before rendering.
Sorting hot comments with Redis
When the comment volume is large, ordering by a hotness metric directly in SQL becomes a performance bottleneck. The common pattern is to maintain a Redis ZSET that contains the top‑N (e.g., 100) hot comments, with the score representing the hotness value. Clients read the sorted set from Redis, achieving sub‑millisecond latency and shielding the relational database from heavy sorting workloads.
High‑concurrency like counter with Redis
For viral content that may receive millions of likes per second, updating the relational database on each click creates a severe write bottleneck. The recommended approach is:
Increment the like count in a Redis key (or hash) instantly when a user clicks “like”.
Periodically (e.g., every 5 seconds) flush the aggregated counts back to the persistent database using a batch UPDATE.
This provides immediate feedback to users while tolerating brief data loss if Redis fails—acceptable for non‑financial counters.
Conclusion
Storing a redundant rootId enables single‑query retrieval of an entire comment thread, simplifying read paths.
Redis is essential for two high‑load scenarios: (a) caching and sorting hot comments with a ZSET, and (b) handling massive like spikes via in‑memory counters with periodic persistence.
Lobster Programming
Sharing insights on technical analysis and exchange, making life better through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
