Big Data 9 min read

Smart Matching Engine for Ride-Sharing: Technical Implementation and Algorithms

The Smart Matching Engine for Haolo’s ride‑sharing service ingests driver and passenger orders via Kafka‑Flink pipelines into Elasticsearch, then applies multi‑stage matching—nearby search, itinerary‑based filtering using ETA, angle, distance, route‑similarity and shared‑mileage calculations—and finally ranks results with evolving pre‑sorting and algorithmic models, including PMML and deep‑learning, to optimize driver‑passenger pairing.

HelloTech

Dec 13, 2021

Smart Matching Engine for Ride-Sharing: Technical Implementation and Algorithms

The Smart Matching Engine primarily serves the Haolo four‑wheel business, providing strong technical support for driver‑passenger matching scenarios in car‑pooling and taxi services. This article explains several technical implementations of driver‑passenger matching in the car‑pooling business.

Matching scenarios in the car‑pooling business are divided by user type into driver‑side and passenger‑side, each involving distinct matching cases.

Application Scenarios

Driver Side

Passenger Side

Overall Process

Data Link

1. Order data generated by drivers or passengers is synchronized from the business DB's binlog to Kafka; Flink jobs consume the Kafka messages and write them into Elasticsearch (ES).

2. The process of drivers or passengers searching for orders is performed by the matching‑engine service recalling order data from ES.

Matching Process

Recall Process Details

Driver without itinerary (Nearby Order Search)

1. Same‑city:

List: Recall passenger orders from the last three days whose start and end cities match the driver’s current city, and whose start point is within a specified radius of the driver’s current location.

Aggregation: Perform a second‑level aggregation on the results by destination county and business district.

2. Cross‑city:

List: Recall passenger orders from the last sixteen days where the start city matches the driver’s current city but the destination city differs.

Aggregation: Aggregate by destination city and county.

Driver with own itinerary

The matching process for drivers with a pre‑planned route involves seven steps, filtering based on basic conditions, time, space, and coverage dimensions. Spatial and coverage filters require business‑logic or algorithmic calculations, implemented via ES plugins.

ETA Filtering

ETA (Estimated Time of Arrival) = dispatch time + (driver‑passenger distance / travel speed). The distance is Manhattan distance, and speed is dynamically configured based on mileage.

Angle and Distance Filtering

Direction angle: the angle between the extended line of the driver’s route and the passenger’s route (illustrated as angle COA). Orders are filtered if this angle is below a threshold and the shortest distance from the passenger’s start/end points to the driver’s route is below a threshold.

Pickup angle: the angle between the driver’s route line and the line from driver’s current location to passenger’s start point (illustrated as angle CAB). Orders are filtered if this angle or the pickup distance is below specified thresholds.

Route Similarity (顺路度) Calculation

Route similarity measures how well the driver’s A_start→B_end aligns with the passenger’s C_start→D_end, based solely on spatial information.

Manhattan distance (taxicab distance) is the sum of absolute differences in the coordinate axes. The article shows examples of Manhattan distance using colored line segments.

Point‑to‑line shortest distance: the driver’s route is stored as a collection of coordinate points; the shortest distance from a passenger point to this polyline is computed.

Two calculation schemes are described:

Scheme 1:

Compute shortest distance S1 by evaluating all points on the AB path against C and D.

Compute perpendicular distance S2 by checking acute angles between adjacent AB points and C/D, then taking the vertical distance.

Take the minimum of S1 and S2.

Scheme 2 (optimized):

Step 1: Incrementally compute shortest points on AB, obtaining point E.

Step 2: Expand E by a step to get DF, then compute shortest distances for points in DF.

Step 3: For adjacent points in DF, compute acute angles to obtain the shortest perpendicular distance.

Step 4: Compare distances from steps 2 and 3 and select the smaller one.

Shared Mileage Calculation

Shared mileage describes the relationship between the driver’s original route and the route after picking up a passenger, ranging from 0 to 1; higher values indicate higher revenue potential.

Sorting

Pre‑sorting

Pre‑sorting is applied after the ES recall stage. Initially, sorting was based solely on the route similarity score; later, additional features such as order price and time were incorporated to improve order acceptance probability.

Algorithmic Sorting

Fine‑ranking after recall determines the user’s first perception. The module uses algorithmic models for sorting. Version 1.0 loads a PMML model from HDFS to the local machine and sorts the recalled results locally. Version 2.0 extracts the sorting module into a separate service, enabling richer feature dimensions and TF‑based deep‑learning models, resulting in higher sorting accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm Flink Elasticsearch Kafka Ride-sharing matching engine

Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.