Artificial Intelligence 15 min read

How Meituan’s MTGR is Redefining Generative Recommendation at Scale

This article explains why Meituan introduced a generative recommendation model, describes the MTGR architecture, data organization, training and inference engines built on TorchRec and TensorRT, reports performance gains and cost reductions, and outlines future directions such as simplifying the recommendation funnel and cross‑business heterogeneous modeling.

DataFunSummit

Sep 11, 2025

How Meituan’s MTGR is Redefining Generative Recommendation at Scale

Background: Why Generative Recommendation?

Traditional recommendation systems have reached a performance ceiling despite increasing model depth, MLP layers, and MoE experts. The scaling law of large language models (e.g., LLaMA, DeepSeek) shows that performance continues to improve with model size, data, and compute, inspiring a shift to generative modeling in recommendation.

MTGR – Meituan Generative Ranking

MTGR (Meituan Generative Ranking) integrates generative modeling ideas into the Meituan delivery ranking pipeline. It treats user behavior, clicks, exposures, and user profiles as a unified token sequence and employs a simplified Transformer architecture (based on the HSTU design) to process long sequences.

Key innovations include:

Data organization : Tokens are categorized as user_profile , lifelong_seq , rt_seq , and pv_items , each sharing a feature space.

Model structure : Multi‑Query Attention and large‑scale MoE (Scaling User Module) replace shallow Target‑Attention compression, preserving richer user behavior representations.

Cross‑feature handling : Instead of discarding cross features, MTGR uses scaling mechanisms to retain crucial signals such as merchant‑user distance.

Group LayerNorm and bidirectional attention for static features, dynamic encoding for real‑time features to prevent information leakage.

Challenges in Deploying Generative Recommendation

Existing infrastructure (TensorFlow 1.x) cannot efficiently support deep attention and large‑scale MoE. Training and inference costs rise sharply with model size, and cross‑feature removal leads to a hundred‑fold compute increase to recover performance.

MTGR‑Training Engine

Built on Meta’s open‑source TorchRec , the engine adds three layers:

Bottom layer : Customized TorchRec core with dynamic hash tables for frequently updated IDs.

Middle layer : Handles data loading, checkpointing, and consistency checks.

Top layer : Provides flexible model interfaces for research.

Performance optimizations include dynamic hash tables, gradient accumulation, ID deduplication (45% throughput gain), variable batch size balancing (30% gain), Cutlass‑based HSTU kernels (2‑3× faster attention), and offloading GAUC computation to data‑loading threads (10% gain).

MTGR‑Inference Engine

Uses TensorRT with Triton Inference Server for millisecond‑level latency. Optimizations cover H2D transfer reduction, hash‑table pruning, FP16 computation, operator fusion, and graph‑level optimizations.

Results

Scaling MTGR from small to large models consistently improves offline and online metrics. The large variant achieves a 65× increase in model complexity while delivering the best performance to date, and reduces inference cost by 44% compared with the previous DLRM‑based system. Retaining cross features yields far larger gains than pure model scaling.

Summary and Outlook

MTGR and its training/inference engines demonstrate that generative ranking can break the compute ceiling of traditional pipelines. Future work will explore simplifying the multi‑stage recommendation funnel and extending the token‑based design to heterogeneous, cross‑business scenarios.

inference optimization scaling law Generative Recommendation MTGR training engine

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.