Artificial Intelligence 23 min read

Ant Group Papers Accepted at ICLR 2025: Summaries and Links

The article presents the abstracts, publication types, links, and research areas of seventeen Ant Group papers accepted at ICLR 2025, covering topics such as embodied robot co‑design, efficient distributed training for large language models, optimization via LLMs, character animation, interactive frame interpolation, KV‑cache management, and privacy‑preserving Transformers.

AntTech
AntTech
AntTech
Ant Group Papers Accepted at ICLR 2025: Summaries and Links

The International Conference on Learning Representations (ICLR) 2025 accepted 17 papers from Ant Group, including 1 Spotlight and 16 Poster papers. Below are the details of each contribution.

BodyGen: Advancing Towards Efficient Embodiment Co‑Design (Spotlight)

Link: https://openreview.net/pdf?id=cTR17xl89h

Source: Research collaboration

Fields: Reinforcement Learning, Embodied Intelligence

Abstract: Embodied co‑design aims to jointly optimize robot morphology and control. Existing work shows promise but faces efficiency challenges due to the combinatorial nature of morphology search and complex morphology‑control dependencies. Ineffective morphology representations and imbalanced reward signals are identified as major obstacles. BodyGen introduces topology‑aware self‑attention for compact morphology encoding and a temporal credit assignment mechanism for balanced rewards, achieving an average 60.03% performance gain over state‑of‑the‑art baselines.

EDiT: A Local‑SGD‑Based Efficient Distributed Training Method for Large Language Models (Poster)

Link: https://arxiv.org/abs/2412.07210

Source: Ant Group independent work

Fields: AI Infrastructure, Large Models, Distributed Training

Abstract: Current distributed training suffers from communication bottlenecks, slow nodes, and lack of elasticity. Existing Local‑SGD methods add memory overhead and work only for small models. EDiT combines Local‑SGD with model‑parallelism, introducing hierarchical synchronization, virtual gradient penalty, and time‑interval synchronization, resulting in faster training, higher stability, and better model quality. Experiments on public datasets show superior speed and performance compared to other Local‑SGD and synchronous methods.

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch (Poster)

Link: https://openreview.net/pdf?id=9OMvtboTJg

Source: Ant Group research interns

Fields: AI, Large Models, Optimization Modeling

Abstract: Formalizing natural‑language described optimization problems is labor‑intensive. LLMOPT proposes a unified learning‑based framework that builds a five‑element problem representation from natural language and pretrained LLMs, uses multi‑instruction tuning, and adds model alignment and self‑correction to mitigate hallucinations. Evaluated on ~20 real‑world datasets across 6 domains, LLMOPT improves solution accuracy by 11.08% over SOTA methods.

Animate‑X: Universal Character Image Animation with Enhanced Motion Representation (Poster)

Link: https://arxiv.org/pdf/2410.10306

Source: Ant Group independent work

Fields: Video Generation, Animation, General Cartoon Characters, Pose Learning

Abstract: Existing character animation methods focus on humans and fail to generalize to anthropomorphic entities. Animate‑X introduces a latent diffusion model (LDM)‑based framework with a Pose Indicator that captures comprehensive motion patterns via implicit CLIP features and explicit simulated inputs. A new benchmark (A²Bench) demonstrates superior performance over SOTA.

Framer: Interactive Frame Interpolation (Poster)

Link: https://arxiv.org/pdf/2410.18978

Source: Ant Group research interns

Fields: Image Manipulation, Video Generation, Cartoon Interpolation

Abstract: Framer generates smooth transitional frames between two images, allowing user‑defined keypoint trajectories for fine‑grained control. It supports an automatic mode with keypoint estimation. Experiments show strong results across image morphing, timelapse generation, and cartoon interpolation, with all code and models open‑sourced.

OmniKV: Dynamic Context Selection for Efficient Long‑Context LLMs (Poster)

Link: https://openreview.net/pdf?id=ulCAPXYXfa

Source: Ant Group independent work

Fields: Large Models

Abstract: KV‑cache dominates GPU memory for long‑context inference. Prior token‑dropping based on attention scores is unreliable. OmniKV identifies important tokens across layers without discarding any token or extra training, achieving 1.68× speedup and up to 75% KV memory reduction. It extends Llama‑3‑8B context length from 128K to 450K on a single A100.

Group Position Embedding (GPE): Enhancing Document Understanding with Layout Information (Poster)

Link: https://openreview.net/pdf?id=Dj9a4zQsSl

Source: Ant Group independent work

Fields: Large Models

Abstract: GPE injects layout awareness into LLMs without architectural changes or extra pre‑training by grouping attention heads and providing independent positional signals per group. Evaluated on five document tasks and a new BLADE benchmark, GPE‑enhanced models achieve performance comparable to SOTA with minimal fine‑tuning.

CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code‑form Planning (Poster)

Link: https://arxiv.org/pdf/2409.12452

Source: Ant Group research interns

Fields: Large Model Complex Reasoning

Abstract: LLMs struggle with multi‑step reasoning. CodePlan introduces a scalable framework that generates executable code‑style plans, capturing rich semantics and control flow. Trained on 2M synthetic samples, CodePlan improves performance by up to 43.8% on 4‑step problems and shows strong data efficiency.

CipherPrune: Efficient and Scalable Private Transformer Inference (Poster)

Link: https://openreview.net/pdf?id=mUMvr33FTu

Source: Research collaboration

Fields: Homomorphic Encryption, Large Models, Privacy Computing

Abstract: Private Transformer inference suffers from high runtime and limited scalability. CipherPrune combines encrypted token pruning and polynomial degree reduction with protocol‑aware network optimization, cutting execution overhead by 6.1× for 128‑token inputs and 10.6× for 512‑token inputs while preserving accuracy.

Overall, these works demonstrate Ant Group’s contributions across embodied AI, large‑model training efficiency, multimodal optimization, generative animation, interactive video synthesis, long‑context handling, document layout understanding, code‑guided reasoning, and privacy‑preserving inference.

large language modelsprivacyRoboticsAI researchICLR2025Ant Group
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.