Memory‑Augmented Self‑Evolving Framework Empowers GUI Agents for Long‑Term Tasks

The SE‑GA framework introduces hierarchical memory and a two‑stage self‑evolution training pipeline that enable GUI agents to retain critical context, learn from past successes, and achieve state‑of‑the‑art performance on long‑horizon benchmarks.

Machine Heart
Machine Heart
Machine Heart
Memory‑Augmented Self‑Evolving Framework Empowers GUI Agents for Long‑Term Tasks

With the rapid advancement of large‑model technology, GUI agents are transitioning from merely interpreting screens to autonomously operating them, yet they frequently fail on multi‑step, long‑duration tasks because they cannot retain full interaction histories and rely on static policies trained on fixed datasets.

The authors identify two fundamental shortcomings: (1) limited context windows cause early crucial information to be forgotten, leading to error accumulation; (2) static strategies prevent agents from learning and transferring successful experiences across tasks.

To address these issues, the SE‑GA framework—presented at ICML 2026 by teams from Tianjin University and Shanghai Jiao Tong University—introduces a hierarchical memory structure and an iterative self‑improvement mechanism, converting a static executor into a dynamic learner.

Test‑Time Memory Extension (TTME) builds a three‑tier memory bank:

Episodic Memory : short‑term work memory that records the previous observation, action, and resulting new observation at each timestep, preserving recent context without excessive computational overhead.

Semantic Memory : a repository of cross‑task interaction rules (e.g., “login is required before accessing restricted pages”) that guides the agent’s understanding of underlying behavior logic.

Experiential Memory : a library of successful past trajectories, including raw traces and the agent’s reflective summaries; TTME employs a hybrid retrieval that considers both semantic similarity and visual similarity to locate relevant experiences.

Memory‑Augmented Self‑Evolution (MASE) provides a two‑stage training pipeline:

Grounding Training : supervised fine‑tuning via behavior cloning of expert trajectories to teach the agent visual grounding and action reasoning.

Self‑Evolution Training : built on the Group Relative Policy Optimization (GRPO) algorithm, this stage incorporates several GUI‑specific enhancements that let the agent continuously learn from its own interaction data.

The framework also introduces Hindsight Goal‑Shifting , which re‑labels a failed trajectory as a successful example for any sub‑goal achieved in its prefix (e.g., opening an app successfully before a later search failure), thereby converting wasted failures into valuable supervision.

Experimental evaluation uses Qwen2.5‑VL‑7B as the base model with 4 K interaction trajectories. On the ScreenSpot GUI element localization benchmark, SE‑GA attains an average score of 89.0%, surpassing UI‑TARS‑72B (88.4%) and other baselines. On high‑level planning benchmarks AndroidControl‑High and GUIOdyssey, SE‑GA achieves 83.9% step‑success and 96.5% action‑type accuracy, matching or exceeding larger 72 B models. In the dynamic AndroidWorld environment, SE‑GA reaches 39.0% success, outpacing UI‑TARS‑7B (33.0%) and GPT‑4o (23.7%), demonstrating robust self‑evolution in changing settings.

Ablation studies confirm that both TTME and MASE are indispensable; removing either component degrades performance across all metrics.

Limitations include the growing size of the experiential memory, which increases retrieval computation and may affect real‑time inference. Future directions proposed are expanding the training dataset, exploring hierarchical task decomposition for ultra‑long workflows, and investigating cross‑platform transfer learning to adapt evolved policies and memory structures to different operating systems.

Overall, SE‑GA unifies memory retention and self‑evolution, turning GUI agents from static command executors into systems that can remember the past, learn from experience, and continuously improve.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

reinforcement learningbenchmark evaluationself-evolutionGUI agentsmemory-augmented AI
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.