Artificial Intelligence 16 min read

Search Matching Models and Applications in DiDi Food

The article outlines DiDi Food’s search relevance challenge, defines semantic matching versus traditional keyword methods, describes the recall‑ranking pipeline, and reviews three families of deep matching models—representation‑based (e.g., DSSM), interaction‑based (e.g., DRMM) and hybrid (e.g., DUET)—including experimental results and a recruitment notice.

Didi Tech
Didi Tech
Didi Tech
Search Matching Models and Applications in DiDi Food

In this article the authors introduce the problem of search relevance in the DiDi Food platform and discuss several deep matching models that have been explored for this task.

The article first outlines three common categories of matching models: (1) representation‑based deep models, (2) interaction‑based deep models, and (3) hybrid models that combine both representation and interaction.

Search relevance is described as a semantic matching problem between a user query and a document (e.g., a restaurant or a dish). The authors distinguish semantic matching from traditional character‑based matching and explain that semantic matching focuses on the meaning of the texts rather than exact word overlap.

Matching vs. Ranking – Matching (or recall) aims to find relevant query‑doc pairs, while ranking orders the retrieved candidates. In DiDi Food the pipeline consists of: (1) intent analysis (query correction, synonym expansion), (2) coarse recall via Elasticsearch, (3) coarse ranking to improve relevance, and (4) final re‑ranking with business rules.

The article then dives into deep matching algorithms . Traditional IR methods such as TF‑IDF, LSA, and BM25 are mentioned as baselines. The three deep model families are explained in detail:

Representation‑based models learn dense vectors for query and document independently and then compute similarity (e.g., cosine). An example is the DSSM model, which uses word hashing (letter‑trigrams) for English and character‑level one‑hot vectors for Chinese. The model consists of an input layer, a representation layer (BOW → DNN), and a matching layer that computes cosine similarity and applies a softmax.

Interaction‑based models build a local interaction matrix X = Dᵀ·Q, capture fine‑grained matching signals, and convert the matrix into a matching histogram. The DRMM model is presented as an example, with a term‑gating network that weights each query term. The histogram can be processed as raw counts (CH), normalized counts (NH), or log‑scaled counts (LCH).

Hybrid models (e.g., DUET) combine a distributed representation branch with a local interaction branch and sum their scores.

Key concepts are clarified with short definitions:

Match(T1, T2)=F(Φ(T1), Φ(T2)) – the generic matching function.

Similarity vs. Relevance – similarity measures symmetric semantic closeness (e.g., paraphrase detection), while relevance measures asymmetric query‑doc usefulness.

Global vs. Local – global distribution focuses on overall semantics, local context emphasizes exact character matches.

Exact, Inexact, and Position term matches – different granularities of term‑level matching.

The authors also describe implementation details such as word hashing (e.g., "boy" → #bo, boy, oy#), the use of tanh activation, and the handling of variable‑length inputs (padding queries to 10 terms and documents to 1000 terms).

In the model effect analysis section, offline experiments on public datasets and on DiDi Food’s own data (Guadalajara city, February) are reported. Four models – DSSM, CDSSM, DRMM, and DUET – are compared, and their performance metrics are shown in tables (images in the original article).

Finally, the article includes a recruitment notice for the DiDi R‑Lab, listing open positions ranging from senior front‑end engineers to algorithm researchers and product managers.

information retrievalSearch RelevanceDiDi Fooddeep matchinginteraction modelingrepresentation learning
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.