Inside MaFengWo’s Scalable Ranking Platform: Architecture, Verification & Explainability
This article explains how MaFengWo’s recommendation system combines recall, ranking, and rerank stages, details the evolution of its sorting algorithm platform, and shows how data verification and model‑explainability techniques like SHAP and LIME improve online performance and accelerate model iteration.
Part 1 – MaFengWo Recommendation System Architecture
The system consists of three stages: Recall (Match) , Ranking (Rank) , and Rerank . Recall filters billions of items into a candidate set (hundreds to thousands). Ranking then scores each candidate according to optimization goals such as click‑through rate, selecting a small set of high‑quality items for the user.
Part 2 – Evolution of the Sorting Algorithm Platform
2.1 Overall Architecture
The online sorting platform is built from three interchangeable modules:
General Data Processing Module – constructs features and training samples using click‑exposure logs, user profiles, and content profiles; relies on Spark batch jobs and Flink streaming.
Replaceable Model Production Module – creates training sets, trains models, and generates online configurations for seamless deployment.
Monitoring & Analysis Module – monitors upstream data, recommendation pools, feature health, and provides visual analysis of models.
Modules interact through JSON configuration files, enabling rapid iteration.
2.1.2 Configuration File Types
Four main config categories are used:
TrainConfig – defines training set selection and model parameters.
MergeConfig – specifies which user, item, context, and cross features to use.
OnlineConfig – generated automatically for online use, containing feature definitions, model paths, and versioning.
CtrConfig – provides default CTR smoothing.
2.1.3 Feature Engineering
Features are grouped as User, Article, and Context. They can be statistical (e.g., clicks, exposures), embedding vectors (Word2Vec‑derived), or cross features (similarity scores).
2.2 Platform V1
V1 relied on simple JSON files to select features, choose training sets, train per‑scene XGBoost models, evaluate offline AUC, and automatically sync online configs.
Issues observed:
Difficulty diagnosing mismatches between offline and online performance.
Lack of model interpretability hindered optimization.
2.3 Platform V2 – Adding Data Verification and Model Explainability
The monitoring module was extended with two new capabilities:
Data Verification – compares offline training data with real‑time logged features to spot inconsistencies caused by data latency or missing‑value handling.
Model Explain – integrates SHAP and LIME to provide both global feature importance and local per‑sample explanations for the XGBoost ranking model.
Data Verification Workflow
Each real‑time click‑exposure record receives a unique ID; the same ID is retained in the offline aggregated table. This allows a one‑to‑one comparison of feature values, AUC, and prediction scores between offline and online pipelines, quickly revealing root causes such as delayed data or incorrect missing‑value parameters.
Applying this process raised online UV click‑through rate by 16.79% and PV click‑through rate by 19.10%.
Model Explainability Details
Traditional XGBoost feature importance is global only. By adding SHAP/LIME, the platform can show how each feature contributes positively or negatively to a single prediction.
Example: for a sample with predicted logits 0.094930, 0.073473, 0.066176, the feature doubleFlow_article_ctr_7_v1 contributed +0.062029, while ui_cosine_70 contributed +0.188769.
logit_output_value = 1.0 / (1 + np.exp(-margin_output_value)), logit_base_value = 1.0 / (1 + np.exp(-margin_base_value))Visualizations (SHAP bar charts) illustrate thresholds where a 7‑day article CTR switches from negative to positive influence.
Part 3 – Future Plans
Upcoming work focuses on improving online model performance and real‑time feature updates. Limitations of the current XGBoost approach include difficulty handling high‑dimensional sparse features and lack of online learning.
Planned upgrades:
Adopt Wide&Deep, DeepFM, and other deep models.
Shift from pointwise (predict‑score‑then‑rank) to listwise learning‑to‑rank for end‑to‑end recommendation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mafengwo Technology
External communication platform of the Mafengwo Technology team, regularly sharing articles on advanced tech practices, tech exchange events, and recruitment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
