Ant Group’s 18 Accepted Papers at AAAI 2025: Summaries and Highlights
This article presents concise English summaries of the 18 Ant Group papers accepted at AAAI 2025, covering topics such as privacy‑preserving large‑model tuning, knowledge‑graph integration, AI‑generated image detection, multi‑task learning, generative retrieval, role‑playing evaluation, and video hallucination mitigation.
On February 25, AAAI 2025 concluded in Philadelphia after an eight‑day conference, accepting 3,032 of 12,957 submissions (23.4% acceptance, 4.6% oral). Ant Group contributed 18 papers (3 oral, 15 poster) spanning privacy‑enhanced large‑model fine‑tuning, knowledge‑graph integration, AI‑generated image detection, multi‑task learning, generative retrieval, role‑playing evaluation, and more.
1. ScaleOT: Privacy‑utility‑scalable Offsite‑tuning with Dynamic LayerReplace and Selective Rank Compression Type: Oral Link: https://arxiv.org/pdf/2412.09812 Source: Ant Group Postdoctoral Workstation Fields: Large‑model privacy fine‑tuning, cross‑domain fine‑tuning Abstract: ScaleOT introduces a layer‑wise lossy compression algorithm guided by reinforcement learning and a lightweight Harmonizer network to replace original LLM layers, achieving 12.5% higher full‑parameter fine‑tuning success and up to 90% compute savings compared with knowledge‑distillation methods.
2. K‑ON: Stacking Knowledge on the Head Layer of Large Language Model Type: Oral Link: https://arxiv.org/pdf/2502.06257 Source: Joint Lab Fields: Knowledge graphs, large models Abstract: K‑ON predicts the next k tokens to embed knowledge‑graph facts into LLMs, enabling simultaneous multi‑entity evaluation and achieving superior performance on KG completion while reducing training epochs from 1000 to 5 on an 8‑GPU setup.
3. WildFake: A Large‑Scale and Hierarchical Dataset for AI‑Generated Images Detection Type: Oral Link: https://arxiv.org/pdf/2402.11843 Source: Independent Fields: AIGC image detection, AIGC Abstract: WildFake aggregates 20+ generation methods into a 3‑million‑image dataset with hierarchical structure, improving detection accuracy of five downstream AIGC detectors by 5‑30% and demonstrating strong generalisation across real‑world scenarios.
4. Bagging‑Expert Network for Multi‑Task Learning: A Depolarization Solution in Multi‑Gate Mixture‑of‑Experts Type: Poster Link: (link not provided) Source: Independent Fields: Multi‑task learning, MoE, recommendation systems Abstract: BEnet adds a bagging layer and attention mechanism to MoE, diversifying expert specialisation and alleviating polarization, yielding robust performance on real‑world e‑commerce multitask datasets.
5. DMT‑RoleBench: A Dynamic Multi‑Turn Dialogue Based Benchmark for Role‑Playing Evaluation of Large Language Model and Agent Type: Poster Link: (link not provided) Source: Independent Fields: Large‑model evaluation Abstract: DMT‑RoleBench provides richer role types, system prompts, and a three‑level metric suite (role‑adoption, dialogue, role‑replication) together with DMT‑RM judge and DMT‑Score scoring, improving NLU capability assessment across 5+ LLMs.
6. DOGR: Leveraging Document‑Oriented Contrastive Learning in Generative Retrieval Type: Poster Link: https://arxiv.org/abs/2502.07219 Source: Independent Fields: Information retrieval, large language models Abstract: DOGR adopts a two‑stage contrastive learning framework that directly models query‑document relevance, achieving state‑of‑the‑art results on two public benchmarks and proving effective for various identifier construction techniques.
7. Improving Natural Language Understanding for LLMs via Large‑scale Instruction Synthesis Type: Poster Link: https://arxiv.org/abs/2502.03843 Source: Independent Fields: LLM, instruction synthesis, NLU, information extraction Abstract: The Hum corpus synthesises high‑quality NLU instructions across IE, MRC, classification and general tasks, boosting NLU performance of six LLMs by an average of 3.1% without harming other capabilities.
8. An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency Type: Poster Link: https://arxiv.org/abs/2412.17504 Source: Independent Fields: Computer vision Abstract: HFPC collects 44 000 human feedback samples and trains a reward model using multimodal BLIP features, while a fine‑tuned segmentation model checks product consistency, achieving 96.4% accuracy and reducing annotation cost.
9. EchoMimic: Lifelike Audio‑Driven Portrait Animations through Editable Landmark Conditions Type: Poster Link: https://github.com/antgroup/echomimic Source: Independent Fields: AIGC digital humans, video generation Abstract: EchoMimic jointly trains on audio and facial landmarks, supporting three control modes (audio‑only, landmarks‑only, both) and delivering superior quantitative and qualitative results on multiple datasets.
10. L3TC: Leveraging RWKV for Learned Lossless Low‑Complexity Text Compression Type: Poster Link: https://arxiv.org/abs/2412.16642 Source: University‑Industry Collaboration Fields: NLP, machine learning, lossless text compression Abstract: L3TC combines RWKV with a sparse‑aware tokenizer and high‑order re‑parameterisation, cutting bits by 48% vs gzip while halving model parameters and delivering real‑time MB‑per‑second decoding.
Apollo‑Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting Type: Poster Link: https://arxiv.org/abs/2412.12226 Source: University‑Industry Collaboration Fields: Time‑series forecasting, large models Abstract: Apollo‑Forecast introduces anti‑aliasing quantisation (AAQM) and draft‑model‑based competitive decoding (RD), improving WQL by 35.41% and MASE by 18.99% while accelerating inference 1.9‑2.7×.
Attributive Reasoning for Hallucination Diagnosis of Large Language Models Type: Poster Link: (link not provided) Source: Research Intern Fields: Trustworthy LLM generation Abstract: Proposes a signal‑based attribution framework and the RelQA‑Cate benchmark (8 hallucination categories); introduces Differential Penalty Decoding (DPD) that raises answer reliability by up to 28.25%.
CSR: Achieving 1‑Bit Key‑Value Cache via Sparse Representation Type: Poster Link: https://arxiv.org/abs/2412.11741 Source: Research Intern Fields: NLP, LLM, KV‑cache compression Abstract: CSR converts dense KV tensors into sparse indices and weights, aided by a NeuralDict generator, matching state‑of‑the‑art KV quantisation performance while drastically reducing memory usage.
HomoMatcher: Dense Feature Matching Results with Semi‑Dense Efficiency by Homography Estimation Type: Poster Link: https://arxiv.org/abs/2411.06700 Source: Research Intern Fields: Computer vision, feature matching Abstract: Introduces a lightweight homography‑estimation network that aligns image blocks, delivering near‑dense matching accuracy at semi‑dense computational cost.
TrustUQA: A Trustful Framework for Unified Structured Data Question Answering Type: Poster Link: https://arxiv.org/pdf/2406.18916 Source: Joint Lab Fields: AI, knowledge graphs Abstract: Defines a LLM‑friendly Condition Graph and a two‑layer CG‑Query, outperforming two existing unified QA methods and achieving best scores on two datasets, while showing promise for mixed‑type structured data QA.
Multi‑Frame Deformable Look‑Up Table for Compressed Video Quality Enhancement Type: Poster Link: https://openreview.net/pdf?id=GbozULGYYD Source: Targeted Collaboration Fields: Computer vision, video quality enhancement, deep learning Abstract: Proposes a multi‑frame deformable LUT that extracts temporal offsets, aligns frames, and fuses multi‑scale features, achieving real‑time (>30 fps) 1080p enhancement and surpassing prior LUT‑based and CNN‑based baselines.
Learning Causal Transition Matrix for Instance‑dependent Label Noise Type: Poster Link: https://arxiv.org/abs/2412.13516 Source: Ant Group Postdoctoral Workstation Fields: Weakly supervised learning Abstract: Models a latent variable that influences both instances and annotation, defining a causal transition matrix that can be approximated and used within a novel training framework to more accurately infer clean labels.
MHBench: Demystifying Motion Hallucination in VideoLLMs Type: (benchmark) Link: https://github.com/xzhouzeng/MHBench/blob/main/README.md Fields: Computer vision Abstract: Introduces the first benchmark (1,200 videos, 20 actions) for motion hallucination, proposes Motion Contrastive Decoding (MotionCD) to suppress hallucination, and shows up to 15.1% performance gains on current VideoLLMs.
Collectively, these works illustrate Ant Group’s broad research contributions at AAAI 2025, ranging from foundational LLM privacy techniques to practical evaluation tools for emerging multimodal AI systems.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.