Data Intelligence in the Used‑Car Business: User Traffic Prediction and Identification (Part 1)
This article details how the 58 Group applied data‑driven methods—user segmentation, interest description, clustering, and predictive modeling—to forecast and identify traffic in the used‑car scenario, illustrating the end‑to‑end pipeline, experimental results, and practical impact on downstream business processes.
Background – Effective AI‑enabled scenarios require deep business understanding; the used‑car platform splits user behavior into four stages (traffic acquisition, retrieval, click, call) and focuses on the first stage for traffic prediction.
Traffic Acquisition – User Segmentation – Five user identities (personal, dealer, platform‑like, crawler, abnormal) are defined, statistical analysis and clustering (K‑Means, GMM, DBSCAN) are used to filter anomalies, and a probabilistic model (LR, XGBoost, FM) estimates identity probabilities, which feed downstream strategies such as phone‑number binding and resource allocation.
Interest Description – Instead of full user profiles, the system builds interest tags (e.g., car series + price range) from behavior logs, computes an item‑based collaborative‑filtering (ItemCF) similarity matrix using Jaccard distance, stores top‑N tags in Redis, and uses them as recall conditions for both offline and real‑time recommendation.
Data Pipeline – The workflow consists of two phases: (1) statistical analysis & rule definition on cleaned historical logs (DS → DW → DM → DA layers) and (2) feature selection, clustering, and supervised learning; PCA is applied for dimensionality reduction, and model AUCs reach ~0.68.
Model‑Driven Interventions – The call‑prediction (cvr) model guides number‑recycling decisions, improving number‑resource usage by ~4 % and reducing cross‑number rates; recall‑rate and precision experiments show a 4.9‑6 % uplift in CTR when the model is applied.
Future Directions – Plans include expanding feature combinations, enriching tag semantics (vehicle attributes, dealer credit), multi‑matrix collaborative recall, and continuous evaluation of similarity‑matrix performance.
Conclusion – Accurate traffic estimation and interest description are foundational for downstream information recall, and the presented pipeline demonstrates a practical AI solution for the used‑car business.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.