Artificial Intelligence 19 min read

Algorithmic Practices in Haola Ride-Sharing: Platform Infrastructure, Matching Recommendation Engine, Transaction Governance, and Intelligent Marketing

This article details Haola's end‑to‑end algorithmic ecosystem for its ride‑sharing service, covering the machine‑learning platform built on Hadoop/YARN, the architecture and evolution of the matching recommendation engine, transaction‑ecosystem governance models, and intelligent marketing strategies including uplift modeling and optimization.

DataFunSummit
DataFunSummit
DataFunSummit
Algorithmic Practices in Haola Ride-Sharing: Platform Infrastructure, Matching Recommendation Engine, Transaction Governance, and Intelligent Marketing

1. Platform Infrastructure

The Haola machine‑learning platform is a one‑stop service built on Hadoop/YARN for resource scheduling and integrates Spark ML, XGBoost, and TensorFlow, supporting both CPU and GPU resources. It provides offline and real‑time feature services, model version management, and high‑availability online prediction, as well as an AB‑testing framework for rapid algorithm validation.

2. Ride‑Sharing Business Overview

Drivers acquire orders through intelligent marketing, then the matching recommendation system suggests passenger requests, followed by transaction‑ecosystem governance algorithms that ensure safety and experience throughout the trip.

3. Matching Recommendation Engine

The engine aims to maximize transaction efficiency while preserving long‑term retention. Data sources include real‑time client context, Flink‑computed near‑real‑time metrics, and offline wide‑table features. The model pipeline consists of recall, coarse ranking, precise ranking, and re‑ranking stages, customized for driver‑side and passenger‑side scenarios.

Recall Challenges and Solutions

Heavy real‑time computation (trajectory expansion, path planning) – mitigated by an 80% trajectory compression algorithm.

One‑off orders lacking offline embeddings – transformed into discrete codes (price, route similarity, distance buckets) and processed with graph‑based recall using node2vec.

Driver preferences (proximity, price, route) – extracted as core factors in the recall chain.

Low‑frequency users – addressed by grid‑based historical destination recall.

Recall versions evolved from V1 (city‑aware recall) to V3 (adding grid and graph recall), improving order‑to‑completion rates by 5%.

4. Precise Ranking Module

Initial cold‑start used simple route‑similarity sorting (Algorithm 1.0). Subsequent versions introduced LightGBM + LR (Algorithm 2.0) and listwise LambdaRank models, achieving up to 10% lift in order acceptance. Deep models (DeepFM, DIN, etc.) were explored but performed worse than the tuned tree‑based approaches.

Key insight: ride‑sharing features are largely continuous; converting them into high‑dimensional discrete leaf‑encodings via LightGBM trees enables effective embedding and cross‑feature interaction.

5. Transaction Ecosystem Governance

Goal: safeguard driver and passenger experience before, during, and after trips. The system predicts cancellation, complaints, or malicious behavior pre‑trip, monitors trajectory deviation and abnormal stops in‑trip, and applies post‑trip responsibility algorithms.

Features combine static spatio‑temporal data, order attributes, offline driver‑passenger behavior, and real‑time streams (trajectory, IM, calls). Labels are obtained from customer‑service tickets, crowd‑sourced reviews, and few‑shot learning.

Model evolution addresses challenges of sparse positive samples, multimodal data fusion, and interpretability (using SHAP and rule‑based post‑processing).

6. Intelligent Marketing

The marketing stack includes data collection (user profiles, behavior, recent clicks), offline model training (ML or deep models), and online deployment via a CRM platform to deliver personalized coupons.

User lifecycle stages (acquisition, activation, retention, re‑engagement) are matched with tailored marketing algorithms. The uplift modeling pipeline progressed from response‑based coupon allocation (v1) to causal uplift models (v2) and finally to an optimization framework (v3) that jointly decides coupon type, amount, and budget allocation using integer programming (Google OR‑Tools) per city.

Uplift modeling paradigms covered:

S‑Learner (single model with treatment flag).

T‑Learner (separate models for treatment and control).

X‑Learner (hybrid approach combining S‑ and T‑Learner predictions).

Tree‑based and neural‑network‑based uplift models were evaluated; X‑Learner showed superior offline AUUC and Gini, leading to a 10% ROI increase over the previous rule‑based system.

Conclusion

The comprehensive algorithmic framework—spanning platform infrastructure, recommendation, governance, and marketing—demonstrates how AI techniques can drive efficiency, safety, and revenue in ride‑sharing services.

AIRecommendation Engineuplift modelingintelligent marketingride-sharingTransaction Governance
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.