Deep Learning Practices for Commercial CTR Prediction at 58.com
This article details the end‑to‑end deep‑learning workflow for click‑through‑rate (CTR) prediction in 58.com’s commercial ranking system, covering system architecture, feature engineering, sample construction, model evolution from Wide&Deep to DIN/DIEN, and engineering optimizations that together yielded significant CPM and CVR improvements.
In the context of internet advertising, deep learning has become central to CTR estimation, especially after the success of AlexNet in 2012. 58.com, a large lifestyle information platform, leverages massive C‑end and B‑end data to balance user experience, advertiser ROI, and platform revenue.
System Framework – The commercial ranking system consists of offline and online pipelines: offline handles feature computation, sample generation, model training and distribution; online performs real‑time feature stitching, scoring, calibration, rule‑based re‑ranking, and logging.
Feature Engineering – Basic features are built from user, client, ad, and context dimensions, incorporating relevance, diversity, timeliness, and authority. High‑order features use embedding‑based representation learning (e.g., word2vec on item co‑occurrence) and address cold‑start via attribute‑level encoding. Bias features correct position bias and temporal freshness, with age and example‑age features mitigating over‑fitting.
Sample Construction – Consistency between offline and online samples is ensured by sharing the same feature library. Two generation schemes are used: (1) snapshot‑based reconstruction of raw samples via state‑backtracking, and (2) a Kappa‑style Flink pipeline that records feature activation timestamps for near‑line sample creation. Various sampling strategies (uniform, PV‑based, user‑based, candidate‑based, etc.) are applied to align data distributions and improve fairness.
Training Sample Processing – After raw sample collection, cleaning and sampling are performed, including user‑item filtering, CTR/CVR‑based truncation, and multi‑dimensional sampling configurations. The pipeline supports one‑click configuration of the entire workflow.
Algorithm Models – The evolution includes Wide&Deep, DeepFM, NFM, DIN, DIEN, and multi‑task ESMM. Wide&Deep combines memorization (wide) and generalization (deep); DeepFM replaces the wide part with factorization machines; DIN introduces attention over user history; DIEN adds GRU‑based interest evolution; ESMM jointly predicts CTR and CVR using shared embeddings.
System Engineering – Optimizations span application‑level (pipeline code refactoring, feature preprocessing unification, protocol simplification) and system‑level (CPU‑only inference, MKL/XLA acceleration, quantization, pruning, TVM compilation). Performance gains of up to 80% in inference latency and 70% in training speed are reported.
Conclusion – Careful feature engineering, sample strategy, model selection, and system‑level optimizations together deliver ~10% CPM uplift and significant CTR/CVR improvements, while highlighting the importance of continuous data, algorithm, and engineering research for future growth.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
