Deep Learning Ranking System and Model for NetEase News Feed Personalization
This article presents the design, implementation, and optimization of a deep‑learning‑based ranking system for NetEase News, covering pipeline architecture, feature‑processing enhancements, custom TensorFlow operators, and modular model frameworks such as DCN and DIEN to improve recommendation performance.
Hello everyone, today I will share the topic: the deep learning ranking system and model used in personalized recommendation for NetEase News client, summarizing our team's R&D experience.
In the information‑flow scenario, personalized recommendation involves recall, ranking, and re‑ranking stages for both headline and short‑video channels, using article, video, image features, user profiles, scenes, and relevance to sort candidates. Effectiveness is measured by online metrics such as CTR, dwell time, retention, and refresh count, with CTR being the most modelable.
When facing a concrete recommendation scenario, an efficient ranking system and model need to be built.
The ranking system consists of three parts: pipeline, ranking model, and model serving.
In the offline stage, the pipeline aggregates real‑time feedback logs by session, matches exposure and click labels, fills features to generate raw samples, and after preprocessing produces training samples. The online pipeline is similar but draws data from real‑time client feedback and recall requests.
The basic flow of the ranking system is shown below. In practice, we needed to modify the basic system for platform consistency, component reusability, and training performance, focusing on feature preprocessing and the model.
To ensure pipeline consistency between offline and online, we store online prediction features and reuse them offline, and we replace native tf.data and feature_column with custom operators to improve performance on large‑scale data.
We implemented custom ops for sample reading and data processing, handling multi‑value weighted features and operations not supported by native TensorFlow.
After the pipeline overhaul, using a configuration file we can describe each feature’s processing flow as a DAG of operators (e.g., age correction → CDF/Bucket, Doc_POI & User_POI → StrToVec → similarity).
These processed features become inputs to the ranking model; adding new features only requires implementing corresponding operators and updating the config.
Optimizing the pipeline and model framework resolves consistency, performance, and extensibility issues, enabling the system to run feature processing on Hadoop Streaming offline and TensorFlow online.
For deep‑learning ranking models, we need a generic, extensible recommendation algorithm library that supports rapid iteration, customization, and highly configurable model construction.
Classic deep ranking architectures include DNN, FNN, PNN, and Wide&Deep, all sharing linear, cross, and deep modules, with feature representation, cross, and fully‑connected layers.
We validate two more complex models: DCN (Deep & Cross Network) and DIEN (Deep Interest Evolution Network), which allow higher‑order feature interactions learned by the network.
Generalizing these insights, we designed a modular model framework where input features (e.g., click history, target docid) are embedded, attended, pooled, and fed into a fully‑connected network, with each sub‑module configurable via a config file.
After these optimizations, the ranking system resolves pipeline consistency, overall performance, and model framework generality and extensibility, delivering significant gains over baseline across recommendation tasks.
Model performance improvements are evident in various recommendation scenarios compared to baselines.
Thank you for listening.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
