Artificial Intelligence 27 min read

What Can AI Agents Learn from the Latest AIR 2025 Research?

The article compiles insights from the AIR 2025 conference and related talks, covering the evolution of agents from reinforcement‑learning to LLM‑driven systems, novel agent architectures like AIDE, GUI agents, natural‑language reinforcement learning, and scaling advances in large language models such as Qwen, while highlighting key algorithms, benchmarks, and open research questions.

AI Frontier Lectures

Mar 24, 2025

What Can AI Agents Learn from the Latest AIR 2025 Research?

1. Agent Driving Transformation: From RL to LLM

Researchers from Nanyang Technological University, UCL, Google DeepMind, Meta, Huawei, and Alibaba discussed the shift from reinforcement‑learning (RL) based agents to large‑language‑model (LLM) powered agents. Professor Anbo presented the Q* algorithm, which combines offline RL, iterative Q‑value updates, and reward shaping using high‑quality rollout trajectories. Weco AI’s CTO Yuxiang introduced AIDE, an AI‑driven agent that treats code optimization as a tree‑search problem in a solution space.

DeepMind researcher Feng Xidong described a vision of expressing all RL components—policy, value function, Bellman equation, Monte‑Carlo sampling, TD learning, and policy‑improvement operators—in natural language, aiming to redefine RL concepts as linguistic constructs.

The AIR 2025 workshop (organized by UCL and Meta) emphasized responsible, adaptable AI systems and featured contributions from UC Berkeley and other institutions.

2. Searching Intelligence in the Solution Space

Weco AI’s CTO Yuxiang detailed the AIDE framework, which formalizes machine‑learning and engineering tasks as tree‑search in a code‑space. The system generates initial solution nodes, iteratively drafts new code, evaluates it, and expands the search tree using A*‑style heuristics. A summary operator keeps the context window bounded, enabling long‑horizon reasoning.

Key components include a reflection model, memory model, and retrieval model to handle complex gaming and software tasks. The approach also integrates RL‑based fine‑tuning (CoSo) that uses causal reasoning to identify influential tokens for more efficient exploration.

3. Focusing on General‑Purpose GUI Agents

Huawei London’s Shao Kun presented a model and optimization strategy for general‑purpose GUI agents. Demonstrations showed agents performing multi‑step tasks such as retrieving currency data, navigating restaurant menus, and interacting with web interfaces. The talk highlighted the need for comprehensive benchmarks, action models, and efficient RL fine‑tuning to improve GUI agent performance.

4. DeepSeek’s Reinforcement‑Learning “Aha” Moment

UCL’s Song Yan explained how RL improves LLM reasoning, citing the R1‑zero model that outperforms OpenAI’s o1 on certain benchmarks. An “Aha” moment was observed when RL enabled LLMs to allocate more token budget for complex reasoning, though some argue the effect stems from the base model’s inherent self‑correction abilities.

Subsequent work introduced TinyZero (30B) and SimpleRL (70B) using Zero‑RL techniques, and multimodal extensions built on Open‑R1, OpenRLHF, and Verl codebases.

5. Natural‑Language Reinforcement Learning Paradigm

Feng Xidong and collaborators proposed mapping RL primitives to natural‑language representations: policies become language strategies, value functions turn into descriptive feedback, and Bellman equations are expressed as consistent linguistic evaluations across timesteps. They introduced language aggregators (G1, G2) to replace numeric averaging and TD updates with textual synthesis.

6. Qwen’s Long‑Context and Scaling Advances

Alibaba’s Lin Junyang described Qwen’s scaling roadmap: Qwen 2.5 expands training data to 18 trillion tokens, with plans for 30‑40 trillion tokens and Mixture‑of‑Experts models. Model sizes range from 0.5 B to 72 B, and context windows have been extended from 32 K to 128 K tokens, with research into trillion‑token contexts via Trunk Attention. Sparse inference techniques reduce generation latency from minutes to seconds, enabling cost‑effective deployment of million‑token contexts.

Overall, the compilation underscores rapid progress in agent architectures, RL‑LLM integration, GUI automation, and large‑scale model engineering, while calling for standardized benchmarks and efficient training methods.