Automobile Home Recommendation System Architecture and Ranking Models
This article presents a comprehensive overview of the Automobile Home recommendation system, detailing its objectives, architecture, various ranking models from LR to DeepFM, online learning mechanisms, service APIs, feature engineering pipelines, model training platforms, debugging tools, and future optimization directions.
The Automobile Home recommendation system has been online for nearly five years, providing personalized content such as articles, videos, and car items to users, with a resource pool exceeding billions.
Its core objectives are user understanding, resource characterization, and optimal matching, which are broken down into user attribute collection, behavior representation, resource feature extraction, and matching via recall and ranking.
The system architecture follows four stages: resource collection, candidate recall, ranking based on user preferences, and delivering the top‑N results. Key modules include a resource pool (MySQL, Hive, Redis), tag generation, indexing, filtering, recall, user profiling, ranking models, feature/model handling, and operational strategies.
The ranking models have evolved from Logistic Regression (LR) to XGBoost, FM, Wide&Deep, DeepFM, DCN, and online learning variants. LR offers fast training and interpretability; XGBoost improved CTR by ~6%; FM introduced second‑order feature interactions; Wide&Deep combined memorization and generalization; DeepFM replaced the wide part with FM and achieved ~3.5% CTR lift; DCN added explicit high‑order cross features; LSTM/GRU sequence models are still experimental.
Online learning updates model parameters in near‑real time using streaming user feedback, reducing the model refresh cycle from days to minutes and improving responsiveness to behavior changes.
The ranking service is exposed via an API that accepts device ID, item ID, request ID, model name/version, and a debug flag, and returns item IDs with scores. It relies on a feature service to provide both offline and real‑time user/item features.
Model training is performed on the AutoPAI platform, which supports visual drag‑and‑drop pipelines, hundreds of algorithms, deep learning frameworks, and distributed GPU training. Simple models (LR, XGBoost) are built via component composition, while deep models are developed, debugged, and deployed through the same platform.
Debugging is facilitated by a dedicated recommendation debug system that simulates online requests, displays intermediate results, and validates indexing, recall, and ranking modules before production release.
Feature engineering includes user profile features (demographics, behavior, interests), item features (metadata, content signals), and cross features (user‑item tag matching). Raw features undergo anomaly handling, normalization (min‑max, log, standard), and equal‑frequency binning before being fed to models.
Feature production combines offline three‑month historical data with second‑level real‑time updates, delivered through a feature service that underpins the ranking service.
Future work aims to move beyond pure CTR optimization toward multi‑objective models (clicks, interaction time, etc.), adopt more expressive architectures such as Transformers, leverage AutoML and reinforcement learning, and enrich multimodal embeddings for text, images, video, and graph data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
