Evolution of Autohome's Recommendation System Ranking Algorithms
The article details the five‑year evolution of Autohome's recommendation system, covering its overall architecture, the progression of ranking models from LR to DeepFM and online learning, feature engineering pipelines, ranking service APIs, AB testing practices, and future optimization directions.
The Autohome recommendation system, launched nearly five years ago, shifted the app's content distribution from a taxonomy‑based approach to personalized recommendation, serving billions of resources such as articles, videos, and car items.
Three core goals drive the system: understanding users, characterizing resources, and optimally matching the two. Matching is split into recall (finding many candidate items) and ranking (selecting the best among them).
Architecture : The pipeline consists of a resource pool (MySQL, Hive, Redis), tag generation, indexing, filtering, recall, user profiling, ranking, feature/model handling, and operations. The ranking service receives DeviceId, ItemId, Pvid, Model‑name, Model‑version, and Debug parameters and returns scored ItemIds.
Ranking Model Evolution : The homepage ranking model progressed through LR, XGBoost, FM, Wide&Deep, DeepFM, DeepFM Online Learning, and experiments with DCN, LSTM, GRU. LR offered fast training and interpretability; XGBoost improved CTR by ~6%; FM added second‑order feature interactions; Wide&Deep combined linear and deep parts; DeepFM replaced the wide part with FM and shared embeddings, achieving a 3.49% CTR lift; DCN introduced explicit high‑order cross features with a 1% CTR gain after feature expansion; LSTM/GRU models for next‑click prediction are still experimental.
Online Learning : Real‑time user feedback updates model parameters every ten minutes, reducing latency while capturing fresh behavior. Labels and features are joined per request to avoid feature leakage.
Feature Engineering : Features include user profile (demographics, behavior, interests, CTR), item attributes (metadata, content embeddings, engagement metrics), and cross features (user‑item tag similarity). Processing steps involve anomaly handling (weight × feature + bias), normalization (min‑max, log, standard), and equal‑frequency bucketing. Embeddings are derived from BERT for text, CNN for images/video, graph embeddings for behavior, and LSTM for sequences.
Feature Production & Service : Offline features cover three months of history; real‑time features are refreshed at second‑level granularity. The feature service supplies both offline and online features to the ranking service, which then generates training samples by joining dumped features with client exposure/click logs.
Future Optimization Directions : Move beyond CTR to multi‑objective optimization (clicks, interactions, dwell time), adopt more expressive models such as Transformers, AutoML, and reinforcement learning, and enhance multimodal feature fusion (text, image, video, behavior) with richer user interest embeddings.
The team, led by senior algorithm engineer Li Chenxu, maintains a large‑scale recommendation platform, supports multiple internal scenarios, and actively contributes research results to the community. They are recruiting algorithm experts in search, recommendation, and NLP; contact [email protected].
Additional resources include links to articles on recommendation evolution, multi‑task learning in recommendation, and community channels such as DataFunTalk for AI and big‑data discussions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
