7 min read

LLM Breakthroughs at EMNLP 2025: Embedding Compression, Complex Instructions, Knowledge Scaling

EMNLP 2025 in Suzhou showcases Taobao's booth featuring four cutting‑edge AI papers that introduce a novel embedding compression framework, an automatic iterative refinement method for complex instruction generation, a knowledge infusion scaling law for large language models, and a video caption optimization approach for text‑to‑video generation.

Alimama Tech

Oct 29, 2025

LLM Breakthroughs at EMNLP 2025: Embedding Compression, Complex Instructions, Knowledge Scaling

EMNLP (Empirical Methods in Natural Language Processing) 2025 will be held in Suzhou International Expo Center from November 4‑9, bringing together top researchers and engineers.

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression (Team: Taobao Algorithm Technology) proposes a sequential Matryoshka representation learning framework with adaptive dimension selection and selectable cross‑batch memory, achieving significant compression of high‑dimensional LLM embeddings while improving performance on BEIR datasets.

AIR: Complex Instruction Generation via Automatic Iterative Refinement (Team: Future Life Lab) introduces an automatic iterative refinement framework that generates constrained complex instructions from documents and iteratively refines them using LLMs as judges, resulting in a 10,000‑instruction dataset and superior performance over existing methods.

How to Inject Knowledge Efficiently? Knowledge Infusion Scaling Law for Pre‑training Large Language Models (Team: Future Life Lab) studies the trade‑off of knowledge injection, identifies a critical collapse point that scales with model size, and proposes a scaling law to predict optimal knowledge amounts for larger models.

VC4VG: Optimizing Video Captions for Text‑to‑Video Generation (Team: Alibaba Mama Technology) presents a caption optimization framework and a benchmark (VC4VG‑Bench) showing that higher‑quality video captions strongly correlate with improved text‑to‑video generation performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models embedding compression video captioning instruction generation knowledge infusion

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.