Designing a Content Hotness Scoring Algorithm for Community Platforms
This article describes how a community’s big‑data team designed a content hotness algorithm by defining time, interaction, content, and user dimensions, assigning business meanings, applying weighted formulas and a Newton‑cooling decay function, and integrating user interest vectors to compute dynamic scores.
In the process of a community redesign, the big‑data team combined business details, research, discussion, and trial‑and‑error to design a basic content hotness scoring algorithm.
Reference: A few foreign companies (Hacker News, Reddit, Stack Overflow, StumbleUpon) have publicly described their hotness algorithms; links are provided for further reading.
Data dimensions considered include:
Time dimensions: post_time, last_reply_time, last_op_time.
Interaction dimensions: view_num, reply_num, favor_num, like_num, reply_like_num, share_num.
Content dimensions: content_length, reply_avg_length, picture_num.
User dimensions: user interest, activity, reputation, etc.
The business meaning of each dimension is explained: time controls decay, interaction metrics are weighted to reflect their impact, content metrics serve as auxiliary quality signals, and user metrics enable personalized recommendation.
Algorithm design focuses on the last_reply_time as the time dimension and models decay using Newton’s cooling law:
H(t) = H_a * exp[-γ * (t - t_last) / 86400]The raw hotness value H_a is calculated from interaction metrics:
H_a = ln(1 + N_view) + 1.0 * N_reply + 1.75 * N_like + 3.2 * N_favorReading count is transformed with a natural logarithm to reflect diminishing contribution, and content length contributes via lg(N_length).
The final hotness formula combines all dimensions:
H = [lg(N_length) + ln(1+N_view) + 1.0*N_reply + 1.75*N_like + 3.2*N_favor] * exp[-γ*(t - t_last)/86400]User interest vectors are built from recent behavior, normalized to [0.1, 1.0], and combined with content tag vectors; the total score may be further adjusted by a random factor to diversify ordering:
rand(0.75,1.25) * C_interest * HConclusion: The algorithm balances technical decay, weighted interaction, content characteristics, and user interest to produce a dynamic hotness score that reflects both content quality and personalized relevance.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.