Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More
This article curates eight AAAI 2026 papers authored by the Meituan research team, covering verifiable stepwise rewards for LLM reasoning, annealing strategies in large‑scale training, process reward models, competence‑difficulty sampling, high‑fidelity visual text rendering, counterfactual fusion, compress‑then‑rank reranking, and cross‑modal quantization for generative recommendation, with direct PDF links for each work.
Promoting Efficient Reasoning with Verifiable Stepwise Reward
Paper Type: Poster
Download: PDF https://arxiv.org/pdf/2508.10293
Large inference models suffer from “over‑thinking” – overly long outputs that increase latency and degrade user experience. The paper introduces a Verifiable Stepwise Reward Mechanism (VSRM) that inserts special </think> tokens to split reasoning into steps, rewards effective steps, penalizes ineffective ones, and uses a forward‑looking window with discounting to densify reward signals. Experiments show VSRM shortens outputs dramatically while maintaining or improving performance across math benchmarks and various models.
Scaling and Transferability of Annealing Strategies in Large Language Model Training
Paper Type: Long Paper
Download: PDF https://arxiv.org/abs/2512.13705
The study investigates how different annealing (learning‑rate scheduling) strategies affect LLM training. It proposes a new scaling law that predicts loss curves based on batch size, learning‑rate schedule, and model size. Key findings include: (1) training steps are a more reliable loss‑tracking metric than token count; (2) the optimal annealing ratio follows a power‑law decay with total steps; (3) this ratio is consistent across train and validation sets; (4) small dense or MoE models can serve as proxies for optimizing large‑scale training dynamics.
From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test‑Time Scaling
Paper Type: Long Paper (Oral)
Download: PDF https://arxiv.org/pdf/2506.00027
The work systematically evaluates Process Reward Models (PRMs) for enhancing LLM reasoning, focusing on cross‑domain generalization from math to code generation. It introduces ASLAF for automatic step‑level annotation and filtering, shows that larger PRMs exhibit diminishing returns, and demonstrates that test‑time scaling strategies such as MCTS outperform simpler sampling when resources permit. Gradient analysis reveals PRMs preferentially select responses sharing underlying reasoning patterns.
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence‑Difficulty Alignment Perspective
Paper Type: Poster
Download: PDF https://arxiv.org/pdf/2505.17652
The authors identify shortcomings of traditional pass‑rate‑based difficulty estimation in RL‑based LLM training. They propose Competence‑Difficulty Alignment Sampling (CDAS), which models model ability and problem difficulty separately, uses historical performance differences for stable difficulty estimates, and selects problems that best match current competence. CDAS improves gradient efficiency and overall performance on math reasoning and code generation tasks.
ViType: High‑Fidelity Visual Text Rendering via Glyph‑Aware Multimodal Diffusion
Paper Type: Oral
Download: PDF https://s3plus.meituan.net/ddfile/ViType__High_Fidelity_Visual_Text_Rendering_via_Glyph_Aware_Multimodal.pdf
Current text‑to‑image models struggle with accurate glyph rendering, especially for multilingual characters. ViType introduces a three‑stage alignment framework: (1) visual‑question‑answer alignment to inject glyph structure into LLM semantics; (2) joint training of pre‑aligned glyph embeddings with text tokens in a multimodal diffusion transformer; (3) aesthetic fine‑tuning on high‑quality image‑text pairs. The system improves character accuracy by over 15 % for e‑commerce graphics.
DSCF: Dual‑Source Counterfactual Fusion for High‑Dimensional Combinatorial Interventions
Paper Type: Poster
Download: PDF https://s3plus.meituan.net/ddfile/%E3%80%90AAAI%E3%80%91DSCF.pdf
In domains such as personalized recommendation and healthcare, predicting counterfactual outcomes for high‑dimensional combinatorial interventions is critical yet challenging due to data sparsity and selection bias. DSCF proposes a dual‑source fusion architecture that jointly models observed data and synthetic counterfactual samples via a domain‑guided fusion mechanism, achieving superior accuracy and robustness on both synthetic and semi‑synthetic benchmarks.
Compress‑then‑Rank: Faster and Better Listwise Reranking with LLMs via Ranking‑Aware Passage Compression
Paper Type: Poster
Download: PDF https://s3plus.meituan.net/ddfile/AAAI2026_C2R(1).pdf
Listwise reranking with large language models is effective but costly. The Compress‑then‑Rank (C2R) framework first compresses each passage into a high‑fidelity vector sequence using a pre‑trained compression model, then performs reranking on these compact representations. Innovations include a pre‑training objective that mixes text reconstruction and continuation, index‑aware embeddings to preserve passage boundaries, and joint optimization of compression and ranking models. Experiments show C2R matches or exceeds full‑passage reranking while drastically reducing latency.
Multi‑Aspect Cross‑modal Quantization for Generative Recommendation
Paper Type: Oral
Download: PDF https://arxiv.org/pdf/2511.15122
The paper presents MACRec, a generative recommendation framework that fuses multimodal (text and visual) signals via cross‑modal residual quantization and multi‑aspect alignment. By integrating contrastive learning into hierarchical quantization and employing both explicit and implicit alignment strategies, MACRec produces balanced, low‑entropy item codes and achieves significant gains on Amazon e‑commerce recommendation benchmarks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
