How Tree‑Based Deep Match Revolutionizes Large‑Scale Recommendation Systems

This article introduces the Tree‑based Deep Match (TDM) framework, which uses a novel max‑heap tree structure to enable efficient, hierarchical retrieval over massive candidate sets, allowing any advanced deep learning model to improve matching accuracy, recall, and novelty in industrial recommendation systems.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Tree‑Based Deep Match Revolutionizes Large‑Scale Recommendation Systems

Background

In e‑commerce, recommendation, search, and advertising are core services that rely on large‑scale candidate matching. Matching must retrieve a high‑quality Top‑K set from billions of items under strict latency constraints, while ranking can use complex models. Improving the matching stage is crucial because its candidate quality caps overall performance.

Related Technologies

First Generation – Statistical Heuristic Rules

Item‑based Collaborative Filtering (Item‑CF) computes item‑to‑item similarity and heuristically selects recent user actions as triggers. While simple and fast, it limits the ability to recommend items a user has never interacted with, capping recommendation diversity.

Second Generation – Vector Retrieval (Inner‑Product Models)

Machine‑learning models estimate user‑item relevance as a vector inner product, accelerated by vector indexes (e.g., Faiss). This improves efficiency but restricts model expressiveness because many advanced deep models cannot be reduced to inner‑product form.

Next Generation – Tree‑Based Matching

To overcome these limits, we propose a tree‑based framework that decouples retrieval structure from model architecture. A hierarchical tree enables coarse‑to‑fine candidate selection, reducing the search space to logarithmic complexity.

Technical Solution

1. Max‑Heap Tree

We define a max‑heap‑like tree where the probability of a parent node is proportional to the maximum probability among its children. This property allows us to find the global Top‑K by recursively selecting the top‑K nodes at each level, without enumerating all leaves.

Max‑Heap Tree Formula
Max‑Heap Tree Formula

2. Global Classifier per Level

Each tree level is modeled by a global classifier trained with negative sampling and maximum‑likelihood estimation. Positive samples are the user‑clicked leaf and all its ancestors; negative samples are randomly chosen siblings at the same level.

Probability Model
Probability Model

3. Hierarchical Top‑K Retrieval

Starting from the root, we select the top‑K children by predicted probability, then expand each selected node to its children and repeat until reaching leaf nodes. The final Top‑K items are the highest‑scoring leaves among the candidate set.

Top‑K Retrieval Algorithm
Top‑K Retrieval Algorithm

4. Tree‑Model Joint Training

We iteratively refine the tree structure: (1) initialize the interest tree from the product taxonomy; (2) train a TDM model; (3) embed nodes and recluster them with K‑Means to rebuild the tree; (4) retrain the model on the new tree; repeat until performance stabilizes. This joint optimization yields >10% improvement in offline metrics.

Joint Training Results
Joint Training Results

Experimental Results

We evaluated TDM on MovieLens and Alibaba’s UserBehavior ad dataset, comparing against Item‑CF, YouTube‑style vector retrieval, and various DNN baselines. Metrics include precision, recall, F1, and novelty.

Recall Comparison
Recall Comparison

Results show that attention‑enhanced TDM consistently outperforms baselines in recall (up to 355% improvement over Item‑CF) and novelty, confirming the effectiveness of hierarchical tree retrieval and the flexibility to incorporate advanced deep models.

Conclusion and Outlook

Tree‑based Deep Match provides a complete, scalable solution for large‑scale matching by combining a max‑heap tree with deep learning, achieving superior accuracy, recall, and novelty. Future work includes supervised tree refinement (e.g., pruning, re‑branching) to further boost performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningdeep learningmatching algorithmstree-based retrievallarge-scale recommendation
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.