How GenericAgent Cuts Token Costs by 10× While Boosting AI Agent Performance

The technical report on GenericAgent, a self‑evolving LLM‑based agent, shows that by maximizing context information density and using a minimal atomic toolset with hierarchical memory, it achieves up to ten‑fold token savings, 100% task accuracy, and progressive efficiency gains across multiple benchmarks.

DataFunTalk
DataFunTalk
DataFunTalk
How GenericAgent Cuts Token Costs by 10× While Boosting AI Agent Performance

What is GenericAgent?

GenericAgent (GA) is a universal, self‑evolving LLM agent developed by Fudan University’s Knowledge Factory A3 Lab in collaboration with Quark Leading Technology. It is positioned as a digital colleague that can continuously learn and improve under user guidance.

Key Advantages

Higher task completion rate: GA reaches 100% accuracy on several benchmarks, surpassing mainstream agents.

Lower token consumption: For the same tasks, GA uses only 15%–35% of the tokens required by competing systems.

Experience reuse: Re‑executing identical tasks reduces token usage by up to 89.6%.

Stronger web browsing: On multi‑hop search tasks, GA’s accuracy is three times higher than baselines while consuming far fewer resources.

Design Principle: Context Information Density

The team argues that long‑term performance depends not on raw context length but on how much decision‑relevant information can be retained within a limited context budget. Maximizing context information density eliminates redundancy, preserves essential decision data, and improves readability.

Four Core Mechanisms

Minimal atomic toolset: GA keeps only nine atomic tools (file ops, code execution, web interaction, memory management, human‑in‑the‑loop) that can be combined to solve complex tasks. The single "code_run" tool is theoretically Turing‑complete, but the other tools lower decision cost.

Layered on‑demand memory: GA organizes memory into four layers—L1 index, L2 factual, L3 SOP (procedural), and L4 raw session archive. Only the needed layer is accessed, keeping the active context compact.

Self‑evolution process: Evolution targets task‑specific strategies, not tool interfaces. Knowledge accumulated in one session becomes reusable SOPs, enabling rapid improvement in subsequent runs.

Context truncation and compression: GA applies four granular pruning techniques—tool output truncation, tag‑level compression, message eviction, and working‑memory anchor prompts—to prevent linear growth of active context.

Evaluation Results

GA was benchmarked on SOP‑bench, Lifelong AgentBench, and RealFinBench. It achieved 100% accuracy on SOP‑bench and Lifelong AgentBench and 65% accuracy on RealFinBench, the highest among peers. Token usage was consistently 15%–35% of competing agents.

Repeated execution experiments showed dramatic efficiency gains: after five runs, runtime dropped from 102 s to 66 s and token consumption halved from 200 k to 100 k. Across eight web‑task repeats, token usage fell on average 79.3%, with a maximum saving of 92.4%.

In the challenging BrowseComp‑ZH multi‑hop reasoning task, GA attained 0.60 accuracy—three times the baseline—while using only one‑third of the tokens.

Key Findings for Agent Design

Context information density is a structural constraint that all LLM‑based agents must address.

Agents need only three core capabilities: tool interface, context management, and memory formation.

Lower token consumption correlates with higher task performance, contradicting the intuition that more reasoning steps yield better results.

Defining the agent’s permission scope caps its intelligence; overly restrictive environments limit capability.

A minimal architecture is essential for autonomous self‑evolution, allowing the system to read, modify, and update its own code.

Conclusion

The report demonstrates that a compact codebase of just over 3,000 lines can deliver a self‑evolving agent with superior efficiency, accuracy, and scalability, highlighting a promising direction for future AI‑agent development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMAI benchmarksToken Efficiencyself-evolvinghierarchical memoryGenericAgent
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.