Artificial Intelligence 9 min read

Modeling Web Novel Popularity with Predictive Ranking and Statistical Fusion

This article explains how a binary‑classification model combining estimated future behavior and statistical data is used to compute a unified popularity score for web novels, improving both recall and ranking in search and library scenarios while addressing challenges of cold‑start and long‑tail items.

Yuewen Technology
Yuewen Technology
Yuewen Technology
Modeling Web Novel Popularity with Predictive Ranking and Statistical Fusion
Background

Popularity is widely used as a metric in search to balance relevance and quality, serving as a feature for learning‑to‑rank (LTR). In the web‑novel domain, popularity includes explicit signals (ratings, comments) and implicit signals (reads, favorites, subscriptions, tickets), making a unified calculation challenging.

Data Analysis

Correlation analysis shows that reads, favorites, and subscriptions are strongly linked to a novel’s hotness. The lifecycle of a novel typically rises to a peak and then declines, similar to its interaction metrics.

Overall Architecture

The model adopts a binary‑classification approach: it predicts the probability of user actions after exposure and combines this estimate with statistical values to produce the final popularity score.

Online, a query is analyzed, then recall uses relevance + popularity, and the results are fed to a LambdaMART ranker that considers relevance, popularity, and basic novel information. Offline, user behavior is aggregated in TDW, processed on a Spark cluster for model training, and the results are written to an Elasticsearch cluster. Because novel popularity changes more slowly than video or live‑stream metrics, daily model updates are sufficient.

Model and Objectives

Any binary classification model can be used; LightGBM was chosen for interpretability and efficiency.

Features

The feature set consists of four groups:

Author‑related features (basic author attributes to address cold‑start).

Work‑related features (static attributes of the novel).

Fan‑related features (subscriptions, evaluations, retention).

Ranking‑list features.

Sample Construction and Sampling

Samples are built at the user‑novel granularity, leading to a large volume. Stratified sampling based on read counts divides novels into hot, middle, tail, and no‑exposure groups, then samples are drawn from each group to ensure coverage and balance.

Evaluation Metrics and Results

Performance is evaluated using probability density distributions of popularity scores and conversion/reading rate metrics. Compared with a baseline that uses only statistical weighting, the new model yields smoother, more discriminative distributions and noticeable improvements in reads, subscriptions, and favorites after deployment.

Conclusion and Outlook

The proposed popularity modeling scheme significantly improves ranking quality for web novels. Future work includes enhancing discrimination for tail and newly released books, consolidating multiple predictive models into a single unified model, and incorporating long‑term reading depth into the popularity calculation.

data analysisRecommendation systemsLightGBMlearning to rankLambdaMARTpopularity modelingweb novels
Yuewen Technology
Written by

Yuewen Technology

The Yuewen Group tech team supports and powers services like QQ Reading, Qidian Books, and Hongxiu Reading. This account targets internet developers, sharing high‑quality original technical content. Follow us for the latest Yuewen tech updates.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.