Ant Group Papers Accepted at ICLR 2025: Summaries and Links
The article presents the abstracts, publication types, links, and research areas of seventeen Ant Group papers accepted at ICLR 2025, covering topics such as embodied robot co‑design, efficient distributed training for large language models, optimization via LLMs, character animation, interactive frame interpolation, KV‑cache management, and privacy‑preserving Transformers.
The International Conference on Learning Representations (ICLR) 2025 accepted 17 papers from Ant Group, including 1 Spotlight and 16 Poster papers. Below are the details of each contribution.
BodyGen: Advancing Towards Efficient Embodiment Co‑Design (Spotlight)
Link: https://openreview.net/pdf?id=cTR17xl89h
Source: Research collaboration
Fields: Reinforcement Learning, Embodied Intelligence
Abstract: Embodied co‑design aims to jointly optimize robot morphology and control. Existing work shows promise but faces efficiency challenges due to the combinatorial nature of morphology search and complex morphology‑control dependencies. Ineffective morphology representations and imbalanced reward signals are identified as major obstacles. BodyGen introduces topology‑aware self‑attention for compact morphology encoding and a temporal credit assignment mechanism for balanced rewards, achieving an average 60.03% performance gain over state‑of‑the‑art baselines.
EDiT: A Local‑SGD‑Based Efficient Distributed Training Method for Large Language Models (Poster)
Link: https://arxiv.org/abs/2412.07210
Source: Ant Group independent work
Fields: AI Infrastructure, Large Models, Distributed Training
Abstract: Current distributed training suffers from communication bottlenecks, slow nodes, and lack of elasticity. Existing Local‑SGD methods add memory overhead and work only for small models. EDiT combines Local‑SGD with model‑parallelism, introducing hierarchical synchronization, virtual gradient penalty, and time‑interval synchronization, resulting in faster training, higher stability, and better model quality. Experiments on public datasets show superior speed and performance compared to other Local‑SGD and synchronous methods.
LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch (Poster)
Link: https://openreview.net/pdf?id=9OMvtboTJg
Source: Ant Group research interns
Fields: AI, Large Models, Optimization Modeling
Abstract: Formalizing natural‑language described optimization problems is labor‑intensive. LLMOPT proposes a unified learning‑based framework that builds a five‑element problem representation from natural language and pretrained LLMs, uses multi‑instruction tuning, and adds model alignment and self‑correction to mitigate hallucinations. Evaluated on ~20 real‑world datasets across 6 domains, LLMOPT improves solution accuracy by 11.08% over SOTA methods.
Animate‑X: Universal Character Image Animation with Enhanced Motion Representation (Poster)
Link: https://arxiv.org/pdf/2410.10306
Source: Ant Group independent work
Fields: Video Generation, Animation, General Cartoon Characters, Pose Learning
Abstract: Existing character animation methods focus on humans and fail to generalize to anthropomorphic entities. Animate‑X introduces a latent diffusion model (LDM)‑based framework with a Pose Indicator that captures comprehensive motion patterns via implicit CLIP features and explicit simulated inputs. A new benchmark (A²Bench) demonstrates superior performance over SOTA.
Framer: Interactive Frame Interpolation (Poster)
Link: https://arxiv.org/pdf/2410.18978
Source: Ant Group research interns
Fields: Image Manipulation, Video Generation, Cartoon Interpolation
Abstract: Framer generates smooth transitional frames between two images, allowing user‑defined keypoint trajectories for fine‑grained control. It supports an automatic mode with keypoint estimation. Experiments show strong results across image morphing, timelapse generation, and cartoon interpolation, with all code and models open‑sourced.
OmniKV: Dynamic Context Selection for Efficient Long‑Context LLMs (Poster)
Link: https://openreview.net/pdf?id=ulCAPXYXfa
Source: Ant Group independent work
Fields: Large Models
Abstract: KV‑cache dominates GPU memory for long‑context inference. Prior token‑dropping based on attention scores is unreliable. OmniKV identifies important tokens across layers without discarding any token or extra training, achieving 1.68× speedup and up to 75% KV memory reduction. It extends Llama‑3‑8B context length from 128K to 450K on a single A100.
Group Position Embedding (GPE): Enhancing Document Understanding with Layout Information (Poster)
Link: https://openreview.net/pdf?id=Dj9a4zQsSl
Source: Ant Group independent work
Fields: Large Models
Abstract: GPE injects layout awareness into LLMs without architectural changes or extra pre‑training by grouping attention heads and providing independent positional signals per group. Evaluated on five document tasks and a new BLADE benchmark, GPE‑enhanced models achieve performance comparable to SOTA with minimal fine‑tuning.
CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code‑form Planning (Poster)
Link: https://arxiv.org/pdf/2409.12452
Source: Ant Group research interns
Fields: Large Model Complex Reasoning
Abstract: LLMs struggle with multi‑step reasoning. CodePlan introduces a scalable framework that generates executable code‑style plans, capturing rich semantics and control flow. Trained on 2M synthetic samples, CodePlan improves performance by up to 43.8% on 4‑step problems and shows strong data efficiency.
CipherPrune: Efficient and Scalable Private Transformer Inference (Poster)
Link: https://openreview.net/pdf?id=mUMvr33FTu
Source: Research collaboration
Fields: Homomorphic Encryption, Large Models, Privacy Computing
Abstract: Private Transformer inference suffers from high runtime and limited scalability. CipherPrune combines encrypted token pruning and polynomial degree reduction with protocol‑aware network optimization, cutting execution overhead by 6.1× for 128‑token inputs and 10.6× for 512‑token inputs while preserving accuracy.
Overall, these works demonstrate Ant Group’s contributions across embodied AI, large‑model training efficiency, multimodal optimization, generative animation, interactive video synthesis, long‑context handling, document layout understanding, code‑guided reasoning, and privacy‑preserving inference.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.