How Joint Optimization of Tree-Based Indexes Boosts Large-Scale Recommendation Accuracy

This article introduces JTM, a joint optimization framework that simultaneously learns deep scoring models and tree-structured indexes, addressing the limitations of traditional recommendation pipelines and demonstrating significant precision and recall gains on large-scale datasets such as Amazon Books and UserBehavior.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Joint Optimization of Tree-Based Indexes Boosts Large-Scale Recommendation Accuracy

Background

Search, recommendation, and advertising share a common technical foundation; from a recommendation perspective, search can be seen as query‑constrained recommendation and advertising as recommendation with price constraints. Innovations in recommendation algorithms therefore drive overall progress across these services.

Early recommendation systems relied on heuristic, statistic‑based Item‑CF methods. Subsequent vector‑retrieval approaches expanded candidate coverage but were limited to inner‑product similarity, preventing the use of more expressive scoring models such as attention‑based deep networks.

Problems in Existing Systems

Large‑scale systems typically consist of three components: a model that predicts user‑item preference, an index that organizes all items, and a retrieval algorithm that selects candidates from the index. In Item‑CF the index and retrieval are hard‑coded rules, offering no learning capability. Vector‑retrieval couples inner‑product scoring with approximate nearest‑neighbor indexes, which limits model expressiveness. The Tree‑based Deep Model (TDM) jointly optimizes model and index but still suffers from mismatched objectives between the model and the tree structure.

Joint Optimization of Tree‑Based Index and Deep Model (JTM)

JTM formulates a unified loss that simultaneously optimizes the scoring model θ and the tree projection function π. The objective encourages the model’s predicted preference distribution p(π(c)|u;θ) to match the binary relevance of positive samples while ensuring that all ancestor nodes of a positive leaf receive probability 1, effectively enforcing an approximate‑max‑heap property.

Mathematically, for each positive pair (u, c) we set p(π(c)|u;θ)=1 and also enforce p(π(ancestor(c))|u;θ)=1. The global empirical loss combines these constraints and is minimized by alternating updates of θ (via gradient‑based optimizers such as SGD or Adam) and π (via a combinatorial matching algorithm).

Tree Structure Learning

The optimization of π is cast as a maximum‑weight bipartite matching problem between leaf nodes and items. Directly solving this problem is infeasible for large catalogs, so JTM employs a hierarchical greedy algorithm that iteratively assigns items to tree layers, reducing the search space at each step.

Hierarchical User Interest Modeling

JTM introduces layer‑wise user behavior embeddings. For each layer l, the user’s historical items are projected to their ancestor nodes at that layer, producing a compact representation that is fed to the scoring model. This design decouples embeddings across layers, preserving independence without increasing parameter count, and captures user intent at appropriate granularity.

Experiments

We evaluated JTM on two public datasets: Amazon Books and Alibaba’s UserBehavior. Baselines include Item‑CF, YouTube product‑DNN, hierarchical softmax (HSM), the original TDM, and a full‑scoring DNN (no index). All neural models used a three‑layer fully‑connected network as the scoring function. Metrics were Precision, Recall, and F‑measure.

JTM achieved the highest recall, improving over the best baseline DNN by 45.3 % on Amazon Books and 8.1 % on UserBehavior. The hierarchical user interest variant (JTM‑H) and the joint‑learning variant without hierarchy (JTM‑J) both outperformed TDM, confirming that both joint optimization and hierarchical features contribute to the gains.

Tree Structure Convergence

We compared JTM’s tree‑learning process with the clustering‑based tree construction used in TDM. Across iterations, JTM’s tree consistently converged to a higher‑quality structure, while the clustering approach exhibited over‑fitting in later stages.

Conclusion

JTM provides a unified, data‑driven framework that jointly optimizes deep scoring models and tree‑based indexes, enabling efficient end‑to‑end recommendation at web scale. By integrating hierarchical user interest modeling and a scalable tree‑learning algorithm, JTM delivers substantial accuracy improvements while maintaining logarithmic retrieval complexity, representing a significant advancement for search, recommendation, and advertising systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deep learningrecommendation systemslarge-scale retrievaljoint optimizationtree-based indexing
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.