Modeling Web Novel Popularity with Predictive Ranking and Statistical Fusion
This article explains how a binary‑classification model combining estimated future behavior and statistical data is used to compute a unified popularity score for web novels, improving both recall and ranking in search and library scenarios while addressing challenges of cold‑start and long‑tail items.
Background
Popularity is widely used as a metric in search to balance relevance and quality, serving as a feature for learning‑to‑rank (LTR). In the web‑novel domain, popularity includes explicit signals (ratings, comments) and implicit signals (reads, favorites, subscriptions, tickets), making a unified calculation challenging.
Data Analysis
Correlation analysis shows that reads, favorites, and subscriptions are strongly linked to a novel’s hotness. The lifecycle of a novel typically rises to a peak and then declines, similar to its interaction metrics.
Overall Architecture
The model adopts a binary‑classification approach: it predicts the probability of user actions after exposure and combines this estimate with statistical values to produce the final popularity score.
Online, a query is analyzed, then recall uses relevance + popularity, and the results are fed to a LambdaMART ranker that considers relevance, popularity, and basic novel information. Offline, user behavior is aggregated in TDW, processed on a Spark cluster for model training, and the results are written to an Elasticsearch cluster. Because novel popularity changes more slowly than video or live‑stream metrics, daily model updates are sufficient.
Model and Objectives
Any binary classification model can be used; LightGBM was chosen for interpretability and efficiency.
Features
The feature set consists of four groups:
Author‑related features (basic author attributes to address cold‑start).
Work‑related features (static attributes of the novel).
Fan‑related features (subscriptions, evaluations, retention).
Ranking‑list features.
Sample Construction and Sampling
Samples are built at the user‑novel granularity, leading to a large volume. Stratified sampling based on read counts divides novels into hot, middle, tail, and no‑exposure groups, then samples are drawn from each group to ensure coverage and balance.
Evaluation Metrics and Results
Performance is evaluated using probability density distributions of popularity scores and conversion/reading rate metrics. Compared with a baseline that uses only statistical weighting, the new model yields smoother, more discriminative distributions and noticeable improvements in reads, subscriptions, and favorites after deployment.
Conclusion and Outlook
The proposed popularity modeling scheme significantly improves ranking quality for web novels. Future work includes enhancing discrimination for tail and newly released books, consolidating multiple predictive models into a single unified model, and incorporating long‑term reading depth into the popularity calculation.
Yuewen Technology
The Yuewen Group tech team supports and powers services like QQ Reading, Qidian Books, and Hongxiu Reading. This account targets internet developers, sharing high‑quality original technical content. Follow us for the latest Yuewen tech updates.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.