Tagged articles

LLM Agents

113 articles · Page 1 of 2
Machine Heart
Machine Heart
Jul 2, 2026 · Artificial Intelligence

How AReaL 2.0 Accelerates Self‑Evolving Agents

AReaL 2.0 introduces an online reinforcement‑learning infrastructure that turns real‑world agent interactions into a learning loop, defining three pillars—trajectory data protocol, data proxy, and evolution control plane—to enable agents to not only execute tasks but continuously improve from their own experience.

AReaLAgentic RLLLM Agents
0 likes · 16 min read
How AReaL 2.0 Accelerates Self‑Evolving Agents
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 25, 2026 · Artificial Intelligence

AutoResearch Advances: RUC & Microsoft Open‑Source Arbor Gives Agents Research Memory

Arbor, an open‑source autonomous research framework from RUC’s Gaoling AI Institute and Microsoft Research, structures the research loop with a growing hypothesis‑tree and insight back‑propagation, allowing agents to retain hypotheses, evidence, and failures, and achieves the best held‑out results on six real AO tasks, surpassing Codex and Claude Code.

AI research automationArbor frameworkLLM Agents
0 likes · 18 min read
AutoResearch Advances: RUC & Microsoft Open‑Source Arbor Gives Agents Research Memory
Architect
Architect
Jun 25, 2026 · Artificial Intelligence

Why a Concise CLAUDE.md Entry File Is Critical for LLM Agents in Your Repo

The article explains how a short, well‑structured CLAUDE.md file injects the minimal yet essential context an LLM coding agent needs before it scans a repository, preventing common mis‑assumptions about tech stack, commands, boundaries, and completion criteria.

AGENTS.mdAI ToolingCLAUDE.md
0 likes · 16 min read
Why a Concise CLAUDE.md Entry File Is Critical for LLM Agents in Your Repo
PaperAgent
PaperAgent
Jun 20, 2026 · Artificial Intelligence

Vertical Domain Agents Gain 88.5% Boost by Adapting the Runtime Interface, Not Retraining

The paper shows that many failures of deterministic LLM agents stem from mismatched model‑environment interfaces, and introduces LIFE‑HARNESS—a four‑layer runtime harness that extracts reusable failure patterns from training trajectories without updating model weights, delivering an average 88.5% relative performance gain across 126 model‑environment settings.

Deterministic AgentsLLM AgentsLife-Harness
0 likes · 8 min read
Vertical Domain Agents Gain 88.5% Boost by Adapting the Runtime Interface, Not Retraining
Machine Heart
Machine Heart
Jun 19, 2026 · Artificial Intelligence

Which Multi‑Agent Communication Protocol Wins? UIUC Introduces ProtocolBench at ICML 2026

The UIUC team presents ProtocolBench, a systematic benchmark that compares four multi‑agent communication protocols across four realistic scenarios, revealing distinct trade‑offs in latency, reliability, and security, and proposes ProtocolRouter to automatically select the most suitable protocol per workload.

BenchmarkLLM AgentsMulti-Agent Systems
0 likes · 14 min read
Which Multi‑Agent Communication Protocol Wins? UIUC Introduces ProtocolBench at ICML 2026
Code Mala Tang
Code Mala Tang
Jun 19, 2026 · Artificial Intelligence

Five Skeptical Questions About RTK’s Token Compression Claims

The article critically examines RTK’s token‑compression promises, exposing misleading savings metrics, silent‑failure bugs, missing task‑success benchmarks, its status as a fragile feature rather than a product, and the brittleness of its output parser, before offering concrete guidance on when to use it.

CLI output parsingLLM AgentsRTK
0 likes · 8 min read
Five Skeptical Questions About RTK’s Token Compression Claims
HyperAI Super Neural
HyperAI Super Neural
Jun 18, 2026 · Artificial Intelligence

Weekly AI Paper Digest: D4RT 300× Faster 4D Reconstruction, SAI Theory Challenges AGI, and More

This week’s AI paper roundup covers DeepMind’s D4RT framework that accelerates dynamic 4D reconstruction by up to 300×, a Columbia‑NYU proposal of Superhuman Adaptable Intelligence that questions AGI, MIT‑UW findings on chatbot delusional spiraling, security risks of autonomous agents, a new ARA protocol for executable research artifacts, a vision of AI‑driven software engineering, and a memory‑caching approach that expands RNN capacity while reducing complexity.

AI safetyArtificial IntelligenceD4RT
0 likes · 11 min read
Weekly AI Paper Digest: D4RT 300× Faster 4D Reconstruction, SAI Theory Challenges AGI, and More
Machine Heart
Machine Heart
Jun 17, 2026 · Artificial Intelligence

Why RL‑Trained Agents Still Fail to Reason Actively: The Information Self‑Locking Problem

The paper reveals that outcome‑based reinforcement learning often traps LLM agents in an information self‑locking regime where weak action selection and belief tracking prevent proper credit assignment, and introduces AREW, a lightweight advantage‑reweighting method that restores active reasoning across multiple tasks and models.

AREWAgentic RLLLM Agents
0 likes · 24 min read
Why RL‑Trained Agents Still Fail to Reason Actively: The Information Self‑Locking Problem
PaperAgent
PaperAgent
Jun 17, 2026 · Artificial Intelligence

Spatial-Agent: A New Concept‑Transformation Paradigm for Map Agents

The paper introduces Spatial‑Agent, which models geospatial question answering as a concept‑transformation process using a GeoFlow Graph intermediate representation, outlines a five‑step workflow, defines core concepts and functional roles, and demonstrates its effectiveness on MapEval‑API and MapQA benchmarks with detailed error and cost analyses.

BenchmarkGISGeoFlow Graph
0 likes · 13 min read
Spatial-Agent: A New Concept‑Transformation Paradigm for Map Agents
Machine Heart
Machine Heart
Jun 15, 2026 · Artificial Intelligence

Breaking the SWE‑bench Score‑Only Myth: Open‑Source Benchmark that Independently Measures Harnesses

The article critiques the reliance on raw SWE‑bench scores for programming agents, introduces the Claw‑SWE‑Bench benchmark and a dedicated adapter that isolates harness effects, and presents extensive experiments showing how model choice, harness design, and cost impact real-world coding performance across multiple languages.

BenchmarkHarnessLLM Agents
0 likes · 14 min read
Breaking the SWE‑bench Score‑Only Myth: Open‑Source Benchmark that Independently Measures Harnesses
Java Tech Enthusiast
Java Tech Enthusiast
Jun 13, 2026 · Artificial Intelligence

Why Bigger 1M‑Token Windows Still Need Careful Context Engineering

Even though modern LLMs like DeepSeek‑V4, GPT‑5.5 and Claude Opus 4.7 support 1 million‑token windows, simply stuffing more data does not improve agent performance; effective Context Engineering—selecting, structuring, and managing the right information—remains essential for reliable results.

LLM AgentsPrompt engineeringRAG
0 likes · 32 min read
Why Bigger 1M‑Token Windows Still Need Careful Context Engineering
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 12, 2026 · Artificial Intelligence

The Next Frontier for Large‑Scale LLM Agents: 17 Must‑Read Papers on Self‑Evolving Harnesses

This article surveys 17 recent core papers that explore how the system‑level harness surrounding large‑model agents can be automatically generated, evolved, and audited, covering topics such as system boundaries, failure‑driven improvement, memory and skill optimization, source‑level rewriting, scaling laws, aging, and safety.

Agent MemoryHarness EngineeringLLM Agents
0 likes · 18 min read
The Next Frontier for Large‑Scale LLM Agents: 17 Must‑Read Papers on Self‑Evolving Harnesses
Data Party THU
Data Party THU
Jun 11, 2026 · Artificial Intelligence

Boost 18 LLM Agents Without Retraining Using LIFE‑HARNESS

The article introduces LIFE‑HARNESS, a runtime‑interface adaptation framework that keeps model weights unchanged, extracts reusable failure patterns from a single model's training trace, and achieves an average 88.5% relative performance gain across 18 LLM agents and 7 deterministic environments, with successful transfer to 17 other models.

LLM AgentsRuntime Harnessbenchmark evaluation
0 likes · 8 min read
Boost 18 LLM Agents Without Retraining Using LIFE‑HARNESS
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 11, 2026 · Artificial Intelligence

Scaling Automated Formalization of Mathematics: Inside Meta’s AutoformBot and the ATLAS Lean 4 Library

Meta’s recent paper presents AutoformBot, a multi‑agent system that treats formalizing entire mathematics textbooks as a large‑scale software‑engineering project, generating the ATLAS Lean 4 library with over 45,000 declarations and demonstrating a 71 % success rate across 26 open‑access books.

AutoformBotLLM AgentsLean 4
0 likes · 14 min read
Scaling Automated Formalization of Mathematics: Inside Meta’s AutoformBot and the ATLAS Lean 4 Library
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 8, 2026 · Artificial Intelligence

Re‑evaluating the Token World of LLM Agents: A Dual‑View Economics Overview

The paper surveys the rapid growth of token consumption in LLM agents, proposes a dual‑view Token Economics framework that treats tokens as production factors, exchange media, and accounting units, and classifies optimization challenges from single‑agent efficiency to ecosystem‑level pricing, security, and future research directions.

AI Resource ManagementLLM AgentsMulti-Agent Systems
0 likes · 10 min read
Re‑evaluating the Token World of LLM Agents: A Dual‑View Economics Overview
Alibaba Cloud Native
Alibaba Cloud Native
Jun 8, 2026 · Artificial Intelligence

Code Harness vs. Model-Driven Harness: Can Agent Control Be Expressed as Executable Natural Language?

The article reviews the "Natural-Language Agent Harnesses" paper, explains the distinction between code, middleware, and harness layers for LLM agents, introduces NLAH and IHR concepts, and details experimental evaluations that show natural‑language harnesses can match code‑based control while exposing new trade‑offs and risks.

Intelligent Harness RuntimeLLM AgentsModule Ablation
0 likes · 13 min read
Code Harness vs. Model-Driven Harness: Can Agent Control Be Expressed as Executable Natural Language?
James' Growth Diary
James' Growth Diary
Jun 1, 2026 · Artificial Intelligence

How Hermes Implements Bounded Memory: Character Limits, Compression, and Snapshots to Prevent Overflow

The article details Hermes' bounded memory system, which uses character limits for persistent files, a three‑stage context compression pipeline, boundary alignment to protect tool calls, snapshot caching, triple redaction, and anti‑thrashing mechanisms, ensuring agents never overflow or lose critical information.

HermesLLM AgentsMemory Management
0 likes · 16 min read
How Hermes Implements Bounded Memory: Character Limits, Compression, and Snapshots to Prevent Overflow
DataFunTalk
DataFunTalk
Jun 1, 2026 · Artificial Intelligence

Rethinking Agent Harness: Toward State‑Aware Runtime for Reliable LLM Agents

The article argues that improving large‑model agents requires more than bigger models or longer context windows; it calls for a stable, auditable, and recoverable runtime that manages state transitions, prevents error propagation, and enables trace‑native evaluation of long‑running agents.

Agent HarnessLLM AgentsReliability
0 likes · 13 min read
Rethinking Agent Harness: Toward State‑Aware Runtime for Reliable LLM Agents
ITPUB
ITPUB
May 30, 2026 · Artificial Intelligence

Is RAG Dead? How Grep Is Making a Comeback in LLM‑Powered Code Search

This article investigates the claim that Retrieval‑Augmented Generation (RAG) is obsolete by dissecting Claude Code’s grep‑driven search architecture, benchmarking its performance against traditional vector‑based retrieval, comparing it with Cursor and OpenAI Codex, and analyzing the trade‑offs of multi‑round agentic search.

Claude CodeCode searchCursor
0 likes · 36 min read
Is RAG Dead? How Grep Is Making a Comeback in LLM‑Powered Code Search
Linyb Geek Road
Linyb Geek Road
May 29, 2026 · Artificial Intelligence

Agent Harness Architecture Deep Dive: From ReAct Loop to Production‑Grade AI System Design

The article argues that the real performance bottleneck of AI agents lies in the Agent Harness infrastructure rather than the model itself, and it systematically explains how prompt, context, and infrastructure layers, tool handling, memory, verification, error handling, and design trade‑offs shape production‑ready LLM agents.

AI InfrastructureAgent HarnessContext Management
0 likes · 24 min read
Agent Harness Architecture Deep Dive: From ReAct Loop to Production‑Grade AI System Design
Code Mala Tang
Code Mala Tang
May 28, 2026 · Artificial Intelligence

When Claude Skills Need Determinism, Use Skillflows

The article analyzes Claude's natural‑language SKILL.md approach, highlights its flexibility and nondeterminism, and explains how adding a declarative skillflow.json graph enforces deterministic execution, auditability, lower token cost, and better consistency for high‑frequency, compliance‑critical tasks.

ClaudeLLM AgentsSkillflows
0 likes · 11 min read
When Claude Skills Need Determinism, Use Skillflows
ShiZhen AI
ShiZhen AI
May 27, 2026 · Artificial Intelligence

Turning Click‑Based Web Agents into Repeatable Scripts with Microsoft’s Open‑Source Webwright

Microsoft’s open‑source Webwright framework redefines browser agents by replacing step‑by‑step click actions with generated Playwright scripts, enabling repeatable, debuggable web tasks; the article details its architecture, workflow, benchmark results on Online‑Mind2Web and Odysseys, and discusses practical benefits and limitations.

BenchmarkGPT-5.4LLM Agents
0 likes · 9 min read
Turning Click‑Based Web Agents into Repeatable Scripts with Microsoft’s Open‑Source Webwright
AI Engineering
AI Engineering
May 26, 2026 · Artificial Intelligence

Training Only the Skill Document While Keeping Model Weights Frozen (SkillOpt)

Microsoft Research introduces SkillOpt, a method that freezes large‑model weights and instead trains a natural‑language skill document as the sole learnable parameter, using a rollout‑reflect‑edit‑gate loop, achieving optimal results across 52 benchmark‑model‑environment combinations and demonstrating strong transferability.

LLM AgentsSkillOptbenchmark evaluation
0 likes · 9 min read
Training Only the Skill Document While Keeping Model Weights Frozen (SkillOpt)
Code Mala Tang
Code Mala Tang
May 25, 2026 · Artificial Intelligence

Behind 95K Stars: browser-use’s LLM Browser Automation vs Playwright

browser-use, an open‑source MIT‑licensed LLM agent loop that compresses page DOM into an indexed list of interactive elements, lets large language models plan and execute web tasks, and is compared against Anthropic’s Computer Use, OpenAI’s Operator and traditional Playwright/Selenium, highlighting its flexibility, lower cost, but higher LLM usage and deployment trade‑offs.

Anthropic Computer UseLLM AgentsMIT license
0 likes · 16 min read
Behind 95K Stars: browser-use’s LLM Browser Automation vs Playwright
HyperAI Super Neural
HyperAI Super Neural
May 25, 2026 · Artificial Intelligence

CVEvolve: Zero‑Code Autonomous Discovery of Scientific Image‑Processing Algorithms

CVEvolve, a no‑code autonomous agent framework from ANL, leverages large‑language‑model agents to discover, evaluate, and iterate scientific image‑processing algorithms without any programming, and demonstrates superior performance on X‑ray fluorescence registration, Bragg‑peak detection, and diffraction‑image segmentation compared with traditional baselines.

CVEvolveImage processingLLM Agents
0 likes · 13 min read
CVEvolve: Zero‑Code Autonomous Discovery of Scientific Image‑Processing Algorithms
PaperAgent
PaperAgent
May 25, 2026 · Artificial Intelligence

DeepSeek’s Harness: How Agent Harness Engineering Is Shaping the Next LLM Agent Era

The article surveys DeepSeek’s Harness initiative, presenting the Binding‑Constraint Thesis, three‑stage evolution from prompt to harness engineering, the ETCLOVG seven‑layer architecture, and concrete benchmark evidence that harness‑only improvements far outweigh model upgrades, while detailing security, observability, and governance considerations for reliable LLM agents.

AI ArchitectureAgent Harness EngineeringAgent evaluation
0 likes · 12 min read
DeepSeek’s Harness: How Agent Harness Engineering Is Shaping the Next LLM Agent Era
James' Growth Diary
James' Growth Diary
May 24, 2026 · Artificial Intelligence

Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop

The article dissects Hermes' background review mechanism, showing how a silent daemon thread performs post‑conversation reflection, writes valuable insights to a skill or memory store, shares prompt designs, fork‑agent isolation, priority update rules, and common pitfalls for building continuously learning LLM agents.

Background ReviewDaemon ThreadHermes
0 likes · 14 min read
Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop
Old Zhang's AI Learning
Old Zhang's AI Learning
May 21, 2026 · Artificial Intelligence

SkillOS: Enabling Agents to Self‑Manage Their Skills

SkillOS reframes skill management for LLM agents as a long‑horizon reinforcement‑learning problem, letting a trainable Skill Curator automatically insert, update, or delete markdown‑based skills, which the frozen Agent Executor then consumes, improving memory‑free performance and cross‑task transfer.

LLM AgentsMarkdownSelf-Evolving Agents
0 likes · 6 min read
SkillOS: Enabling Agents to Self‑Manage Their Skills
Data Party THU
Data Party THU
May 18, 2026 · Artificial Intelligence

How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking

VIGIL introduces a verify‑before‑commit framework that isolates tool‑stream injection attacks on LLM agents, using intent anchoring, perception sanitization, speculative reasoning, grounding verification, and validated trajectory memory, reducing attack success rates to 8‑12% while preserving task utility.

AI safetyLLM AgentsSIREN benchmark
0 likes · 11 min read
How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking
LuTiao Programming
LuTiao Programming
May 17, 2026 · Artificial Intelligence

Why Your AI Keeps Going Off‑Track: The 4 Essential CLAUDE.md Directives

The article analyzes why AI coding assistants often stray from intended requirements, exposing a core judgment deficit, and shows how a concise four‑line CLAUDE.md file—detailing assumptions, minimal code, scoped changes, and verifiable success criteria—can dramatically improve AI behavior, reduce over‑design, and lower review costs.

AI codingCLAUDE.mdLLM Agents
0 likes · 11 min read
Why Your AI Keeps Going Off‑Track: The 4 Essential CLAUDE.md Directives
dbaplus Community
dbaplus Community
May 17, 2026 · Artificial Intelligence

Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving

The article dissects Claude Code’s LLM‑driven Grep search, showing how multi‑round tool calls replace static vector‑based RAG, presents ripgrep performance benchmarks, compares Claude Code with Cursor and Codex, and argues that zero‑index search is optimal for local code bases while larger projects still need indexing.

Claude CodeCode searchLLM Agents
0 likes · 36 min read
Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving
PaperAgent
PaperAgent
May 11, 2026 · Artificial Intelligence

SkillOS: How Skill Governance Powers Self‑Evolving AI Agents

SkillOS addresses the one‑off nature of current LLM agents by introducing a closed‑loop system where a trainable Skill Curator continuously extracts, updates, and manages reusable skills from execution traces, leading to measurable gains in success rates, efficiency, and cross‑task generalization.

Grouped Task StreamsLLM AgentsMeta-Strategy Skills
0 likes · 10 min read
SkillOS: How Skill Governance Powers Self‑Evolving AI Agents
Linyb Geek Road
Linyb Geek Road
May 10, 2026 · Artificial Intelligence

Designing Progressive Large‑Model Agents: Architecture, Frameworks, and Real‑World Practices

This article examines the evolution of large‑model agents, outlines four development stages, compares workflow, collaborative, and evolutionary frameworks, details core components such as perception, memory, planning, tools, and reflection, and explains how a progressive, loop‑based architecture can be applied across verticals like research, code generation, and complex workflow automation.

AlphaEvolveLLM AgentsLangGraph
0 likes · 9 min read
Designing Progressive Large‑Model Agents: Architecture, Frameworks, and Real‑World Practices
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 8, 2026 · Artificial Intelligence

T²PO: Uncertainty‑Guided Exploration Control for Stable Multi‑Turn Agent RL

The paper identifies inefficient exploration, termed "hesitation," as the root cause of instability in multi‑turn reinforcement learning for LLM agents and introduces T²PO, an uncertainty‑driven token‑ and turn‑level control framework that markedly improves training stability and performance on benchmarks such as WebShop, ALFWorld, and Search QA.

LLM AgentsT2POUncertainty
0 likes · 16 min read
T²PO: Uncertainty‑Guided Exploration Control for Stable Multi‑Turn Agent RL
PaperAgent
PaperAgent
May 4, 2026 · Artificial Intelligence

A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution

This survey systematically reviews self‑evolving agents, explains why autonomous agents are needed, proposes a unified taxonomy of three evolution paradigms, analyzes model‑centric, environment‑centric, and co‑evolution approaches, and outlines future challenges in designing adaptive environments.

AI Agent TaxonomyCo-EvolutionEnvironment-Centric Evolution
0 likes · 14 min read
A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution
AI Tech Publishing
AI Tech Publishing
May 1, 2026 · Artificial Intelligence

5 Counterintuitive Design Principles for Prompt Caching in Claude Code

The article details five counterintuitive design principles for Claude Code's prompt caching—optimizing prompt layout, using message‑based updates, never switching models or tools mid‑conversation, safely compressing context, and monitoring cache health—backed by concrete examples and up to 90% cost savings.

AI EngineeringCache OptimizationClaude Code
0 likes · 10 min read
5 Counterintuitive Design Principles for Prompt Caching in Claude Code
AI Explorer
AI Explorer
Apr 30, 2026 · Industry Insights

AI Tech Daily: Key AI Industry Highlights for April 30 2026

The AI Tech Daily roundup highlights Microsoft's 123% AI revenue surge, groundbreaking GPT‑5.5 restrictions, DeepSeek's multimodal launch, Ant Group's zkDTVM benchmark record, a 23‑year‑old Linux kernel bug, Stripe's 288 AI‑focused features, and emerging trends in LLM agent orchestration and AI adoption metrics.

AI revenueDeepSeekGPT-5.5
0 likes · 4 min read
AI Tech Daily: Key AI Industry Highlights for April 30 2026
SuanNi
SuanNi
Apr 27, 2026 · Artificial Intelligence

How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy

The paper shows that conventional LLM agents fail on real‑world enterprise data because of chaotic data sources, while the RUBICON architecture uses a minimal Agentic Query Language to let users direct data retrieval, achieving 100% accuracy with a much cheaper model and dramatically lower token and monetary costs.

Agentic Query LanguageBenchmarkData Integration
0 likes · 11 min read
How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy
AI Architecture Hub
AI Architecture Hub
Apr 23, 2026 · Artificial Intelligence

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Prompt caching, a prefix‑matching technique that reuses prior LLM interactions, proved essential for Claude Code’s low latency and cost, and the article details counter‑intuitive practices such as arranging static prompts first, updating info via messages, avoiding mid‑session model or tool changes, and ensuring cache‑safe context forks.

AI EngineeringCache OptimizationClaude Code
0 likes · 10 min read
Why Prompt Caching Is Critical: Lessons from Building Claude Code
AI Waka
AI Waka
Apr 22, 2026 · Artificial Intelligence

Hybrid MCP‑Skill Model: Keeping LLM Agent Skills Fresh

The article analyzes the trade‑offs between packaging new agent functionality as a static Skill versus a dynamic MCP server, proposes a hybrid thin‑CLI approach that combines the ease of Skills with the up‑to‑date guarantees of MCP, and illustrates the design with concrete code examples.

CLI wrapperHybrid ArchitectureLLM Agents
0 likes · 7 min read
Hybrid MCP‑Skill Model: Keeping LLM Agent Skills Fresh
PaperAgent
PaperAgent
Apr 22, 2026 · Artificial Intelligence

How SkillClaw Enables Collective Evolution of Agent Skills in Real-World Use

SkillClaw introduces a centralized evolution framework that transforms user interactions into structured evidence, allowing LLM agents to refine, create, or skip skills based on aggregated success and failure patterns, with nightly validation ensuring only proven improvements are deployed, resulting in consistent performance gains across diverse tasks.

AI workflowBenchmarkLLM Agents
0 likes · 13 min read
How SkillClaw Enables Collective Evolution of Agent Skills in Real-World Use
AntTech
AntTech
Apr 22, 2026 · Artificial Intelligence

How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents

This article reviews two recent ICLR 2026 papers—M²‑Miner, a multi‑agent Monte‑Carlo Tree Search framework for low‑cost mobile GUI data mining, and IGPO, an information‑gain‑based reinforcement‑learning method that provides dense rewards for multi‑turn search agents—detailing their designs, experiments, and open‑source releases.

GUI Data MiningInformation GainLLM Agents
0 likes · 8 min read
How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents
Machine Heart
Machine Heart
Apr 21, 2026 · Artificial Intelligence

How Externalization Drives the Evolution of LLM Agents – Insights from a 54‑Page SJTU Review

A recent 54‑page arXiv review by Shanghai Jiao Tong University and collaborators argues that the reliability gains of LLM agents stem more from externalizing memory, skills, protocols, and harness infrastructure than from scaling the underlying model, outlining three structural mismatches and a unified externalization framework.

ExternalizationHarnessLLM Agents
0 likes · 13 min read
How Externalization Drives the Evolution of LLM Agents – Insights from a 54‑Page SJTU Review
SuanNi
SuanNi
Apr 19, 2026 · Artificial Intelligence

Why External Cognition Is the New Engine Behind Reliable LLM Agents

The article analyzes how the success of large‑language‑model agents now hinges on external cognitive infrastructure—memory, skills, protocols, and a central Harness—rather than raw model parameters, outlining architectural evolution, practical challenges, and emerging industry trends.

AI industry trendsHarness frameworkLLM Agents
0 likes · 15 min read
Why External Cognition Is the New Engine Behind Reliable LLM Agents
AI Architecture Hub
AI Architecture Hub
Apr 18, 2026 · Artificial Intelligence

Build a Dual‑Layer AI Knowledge Base in 20 Minutes and Supercharge Your LLM Agents

This article explains how to create a two‑layer AI knowledge system— a dynamic Knowledge Base Layer and a static Brand Foundation Layer— in about 20 minutes, detailing its architecture, advantages over traditional RAG, step‑by‑step deployment, and real‑world use cases for creators, teams, and personal productivity.

AI knowledge baseGitKnowledge Management
0 likes · 16 min read
Build a Dual‑Layer AI Knowledge Base in 20 Minutes and Supercharge Your LLM Agents
AI Waka
AI Waka
Apr 17, 2026 · Artificial Intelligence

From Generative to Agentic AI: Building Real‑World Agent Systems

The article explains how AI is shifting from reactive generative models to goal‑driven Agentic systems, outlines core framework components, common patterns, skill abstractions, a step‑by‑step implementation guide for backend engineers, and introduces Harness Engineering for production‑grade reliability and observability.

AI frameworksLLM AgentsObservability
0 likes · 10 min read
From Generative to Agentic AI: Building Real‑World Agent Systems
Linyb Geek Road
Linyb Geek Road
Apr 16, 2026 · Artificial Intelligence

Does Conway's Law Apply to LLM Agent Systems? Design Insights and Best Practices

The article explores how Conway's Law—"organizations design systems that mirror their structure"—extends to large‑model agent architectures, offering concrete examples, role‑alignment strategies, concise communication patterns, and cautions against over‑engineering to improve multi‑agent collaboration.

AI CoordinationAgent System DesignConway's Law
0 likes · 9 min read
Does Conway's Law Apply to LLM Agent Systems? Design Insights and Best Practices
Baidu Geek Talk
Baidu Geek Talk
Apr 15, 2026 · Artificial Intelligence

Unveiling Claude Code: How Rules, MCP, and Skills Power the Coding Agent

This article dissects the leaked Claude Code v2.1.88 source to reveal how the three core concepts—Rules, MCP, and Skills—are implemented, where they are injected in the Anthropic LLM API request, and when developers should prefer each mechanism for reliable, secure, and token‑efficient coding agent workflows.

Claude CodeLLM AgentsMCP
0 likes · 25 min read
Unveiling Claude Code: How Rules, MCP, and Skills Power the Coding Agent
AI Engineer Programming
AI Engineer Programming
Apr 15, 2026 · Artificial Intelligence

Elephant Alpha: Free 100B‑Parameter Instant Model with 256K Context on OpenRouter

OpenRouter quietly launched Elephant Alpha, a free 100B‑parameter LLM with a 256K token window, positioned as an "instant model" that prioritises token efficiency and speed, supports function calling and prompt caching, and is compared against other Animal‑series models while community speculation surrounds its origin.

256K contextElephant AlphaFunction Calling
0 likes · 6 min read
Elephant Alpha: Free 100B‑Parameter Instant Model with 256K Context on OpenRouter
AI Tech Publishing
AI Tech Publishing
Apr 12, 2026 · Artificial Intelligence

How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store

The article dissects Hermes Agent’s four‑store memory architecture—declarative, procedural, situational, and persona—deterministic routing, frozen snapshots, nudge‑driven persistence, security scanning, dual‑peer modeling, skill management, and three‑phase context compression, showing why it outperforms OpenClaw’s breadth‑first design.

Hermes AgentLLM AgentsMemory Architecture
0 likes · 17 min read
How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 10, 2026 · Artificial Intelligence

One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing

PaperOrchestra, Google’s multi‑agent framework, turns raw experiment logs, brief ideas, LaTeX templates and conference guidelines into fully formatted CVPR/ICLR papers, using five coordinated agents, Semantic Scholar verification, PaperBanana figure generation, and a refinement loop that boosts simulated acceptance rates by up to 22% while running in under 40 minutes.

BenchmarkLLM AgentsPaperBanana
0 likes · 9 min read
One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing
AI Engineering
AI Engineering
Apr 10, 2026 · Artificial Intelligence

Getting Started with Hermes Agent: A Complete Beginner’s Guide

Hermes Agent, the open‑source LLM‑driven framework from Nous Research, has attracted 43.7K GitHub stars, but its documentation leaves many developers stranded; a community‑curated ecosystem map and the “Orange Book” guide now provide step‑by‑step installation, skill development, multi‑agent orchestration, and deployment resources to bridge the gap.

Documentation guideEcosystem mapHermes Agent
0 likes · 5 min read
Getting Started with Hermes Agent: A Complete Beginner’s Guide
AI Step-by-Step
AI Step-by-Step
Apr 8, 2026 · Operations

How to Light Up the Black Box of LLM Agents with Full‑Stack Observability

The article explains why traditional logs are insufficient for LLM agents, outlines five observability dimensions—tracing, metrics, behavioral governance, state & memory, and evaluation—and provides concrete, open‑source‑based steps to instrument, monitor, and act on agent workloads in production.

Behavioral GovernanceEvaluationLLM Agents
0 likes · 11 min read
How to Light Up the Black Box of LLM Agents with Full‑Stack Observability
AgentGuide
AgentGuide
Apr 2, 2026 · Artificial Intelligence

Understanding ReAct: The Reason‑Act Loop Behind LLM Agents

The article explains ReAct—a Reason‑Act framework for large language model agents that observes, reasons, takes actions via tools, receives feedback, and iterates—highlighting its distinction from plain QA, its step‑by‑step workflow, practical importance, and a weather‑query example.

AI workflowLLM AgentsReAct
0 likes · 5 min read
Understanding ReAct: The Reason‑Act Loop Behind LLM Agents
AI Step-by-Step
AI Step-by-Step
Mar 30, 2026 · Artificial Intelligence

How to Keep LLM Agents in Check with Guardrails

The article explains why LLM agents can over‑promise or execute unauthorized actions, and outlines a three‑layer guardrail system—prompt review, output validation, and tool‑action interception—plus concrete rules, examples, and test cases to ensure safe deployment.

AI safetyGuardrailsLLM Agents
0 likes · 11 min read
How to Keep LLM Agents in Check with Guardrails
DevOps Coach
DevOps Coach
Mar 27, 2026 · Operations

Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?

An experiment with four LLM‑driven autonomous agents—Architect, Builder, Security Sentinel, and QA Tester—attempted to provision a Proxmox‑based HA Kubernetes cluster using real hardware, revealing costly context drift, emergent coordination failures, and stark differences between Gemini and Claude in diagnosing infrastructure‑as‑code errors.

AI OpsAnsibleAutonomous SRE
0 likes · 14 min read
Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?
Frontend AI Walk
Frontend AI Walk
Mar 25, 2026 · Artificial Intelligence

Slow Learning Agents: 7 Cognitive Shifts from Using ChatGPT to Truly Understanding Agents

The article outlines seven essential mindset transitions for building robust LLM agents—recognizing agents as autonomous decision loops, prioritizing harness over model size, layering context, designing tools for agent goals, structuring multi‑layer memory, coordinating multiple agents with isolation and protocols, and aligning evaluation with the real environment.

Context ManagementEvaluationHarness
0 likes · 16 min read
Slow Learning Agents: 7 Cognitive Shifts from Using ChatGPT to Truly Understanding Agents
AI Architecture Hub
AI Architecture Hub
Mar 25, 2026 · Artificial Intelligence

How Memento-Skills Enables Continuous Learning for Frozen LLM Agents

The article analyzes the limitations of frozen LLM agents—fixed parameters, loss of state, and costly fine‑tuning—and introduces the Memento‑Skills framework, which adds an external, evolvable skill memory to achieve deployment‑time learning, detailed architecture, optimization knobs, and strong experimental gains.

AI researchDeployment-Time LearningLLM Agents
0 likes · 14 min read
How Memento-Skills Enables Continuous Learning for Frozen LLM Agents
Tencent Cloud Developer
Tencent Cloud Developer
Mar 24, 2026 · Artificial Intelligence

Why AI Coding Agents Miss the Mark—and How to Make Them Work

The article analyzes the hype around AI coding tools like OpenClaw, exposing false demands, the pitfalls of building agents before real needs, the quality gaps in AI‑generated code, and practical strategies such as spec‑first coding, bottleneck identification, and multi‑model orchestration to improve productivity.

AI codingLLM AgentsSpec Coding
0 likes · 15 min read
Why AI Coding Agents Miss the Mark—and How to Make Them Work
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 19, 2026 · Artificial Intelligence

From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning

The article analyzes the RETROAGENT framework, showing how its dual intrinsic feedback and memory‑buffer mechanisms enable LLM agents to move beyond solving tasks toward continual evolution, and presents benchmark results that demonstrate significant performance gains and strong test‑time adaptation across four challenging environments.

LLM AgentsRETROAGENTdual intrinsic feedback
0 likes · 7 min read
From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Mar 15, 2026 · Artificial Intelligence

When AI ‘Crayfish’ Takes Over Testing, Where Do 80% of Testers Go?

The article demonstrates how an LLM‑powered agent (nicknamed “crayfish”) equipped with OpenClaw and Playwright MCP can autonomously perform web‑testing tasks—handling environment setup, visual OCR, error recovery and reporting—showing a shift from fragile scripted automation to intent‑driven testing and warning that traditional test engineers have little time left to adapt.

AI testingLLM AgentsPlaywright
0 likes · 11 min read
When AI ‘Crayfish’ Takes Over Testing, Where Do 80% of Testers Go?
DeepHub IMBA
DeepHub IMBA
Mar 14, 2026 · Artificial Intelligence

Three Proven Multi‑Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm

The article explains why single LLM agents often fail due to context overload, role confusion, and fault propagation, then details three reliable orchestration patterns—Supervisor, Pipeline, and Swarm—along with concrete code examples, communication schemas, error‑handling layers, cost and latency considerations, and best‑practice recommendations for production deployment.

Distributed TracingLLM AgentsMulti-Agent Systems
0 likes · 15 min read
Three Proven Multi‑Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm
Architect
Architect
Mar 11, 2026 · Artificial Intelligence

How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery

This article explains OpenClaw's sophisticated context‑management system, detailing its three‑layer approach to pruning old turns, trimming tool results, and handling oversized outputs, while preserving critical state through memory flushing, structured compaction, and a robust overflow‑recovery pipeline.

LLM Agentscompressionmemory persistence
0 likes · 29 min read
How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery
AI Explorer
AI Explorer
Mar 6, 2026 · Artificial Intelligence

AReaL: Lightning‑Fast Asynchronous RL Engine for Building High‑Performance LLM Agents

AReaL, an open‑source, fully asynchronous reinforcement‑learning platform co‑developed by Tsinghua University and Ant Group, dramatically speeds up training of complex LLM agents, offering a simple, stable, and hardware‑flexible solution for developers seeking industrial‑grade AI agents.

AI InfrastructureAReaLAsynchronous Training
0 likes · 7 min read
AReaL: Lightning‑Fast Asynchronous RL Engine for Building High‑Performance LLM Agents
Woodpecker Software Testing
Woodpecker Software Testing
Mar 5, 2026 · Artificial Intelligence

AI Agent Testing: An In-Depth Guide Every Test Expert Needs

The article explains why traditional assertion‑based testing fails for LLM‑driven AI agents and introduces a four‑dimensional GBRT framework—Goal, Behavior, Resilience, Traceability—detailing concrete examples, evaluation methods, toolchain integration, and practical steps to build measurable, robust test pipelines for autonomous agents.

AI testingGBRTLLM Agents
0 likes · 9 min read
AI Agent Testing: An In-Depth Guide Every Test Expert Needs
PaperAgent
PaperAgent
Mar 2, 2026 · Artificial Intelligence

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.

LLM AgentsSKILLRLhierarchical skill bank
0 likes · 10 min read
SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution
AI Tech Publishing
AI Tech Publishing
Mar 2, 2026 · Artificial Intelligence

Why pi-mono’s Agent Design Is an Anti‑Pattern (and What Works Better)

The author explains why Claude Code became too bloated, outlines the minimal, controllable requirements for a code‑assistant, details pi-mono’s four‑package architecture, shares design anti‑patterns, and presents benchmark results showing its simple approach outperforms heavier agents.

Agent DesignBenchmarkClaude Opus
0 likes · 13 min read
Why pi-mono’s Agent Design Is an Anti‑Pattern (and What Works Better)
AI Waka
AI Waka
Feb 27, 2026 · Artificial Intelligence

How to Add Persistent Long‑Term Memory to LangGraph Agents with Trustcall

This article explains how to integrate durable long‑term memory into LangGraph agents, covering memory types, their coordination, limitations of native LangGraph storage, and a step‑by‑step implementation using Trustcall’s schema‑driven extractors for both user profiles and paper collections.

AILLM AgentsLangGraph
0 likes · 16 min read
How to Add Persistent Long‑Term Memory to LangGraph Agents with Trustcall
Architect
Architect
Feb 13, 2026 · Artificial Intelligence

Cutting Agent Costs: Practical Tips from the ‘Toward Efficient Agents’ Survey

The article analyzes why autonomous LLM agents become expensive, breaks down their cost components, and presents concrete engineering strategies—memory management, tool‑call optimization, and planning constraints—to dramatically reduce token usage and improve reliability while maintaining performance.

LLM AgentsPlanningcost optimization
0 likes · 19 min read
Cutting Agent Costs: Practical Tips from the ‘Toward Efficient Agents’ Survey
PaperAgent
PaperAgent
Jan 28, 2026 · Artificial Intelligence

How Clawdbot Achieves Persistent, Local Memory for LLM Agents

Clawdbot implements a fully local, persistent memory system for LLM agents by storing context and long‑term knowledge in editable Markdown files, indexing them with SQLite‑vec and FTS5, supporting multi‑agent isolation, compression, pruning, and configurable session lifecycles to maintain efficient, cost‑effective interactions.

LLM Agentscontext compressionlocal storage
0 likes · 13 min read
How Clawdbot Achieves Persistent, Local Memory for LLM Agents
High Availability Architecture
High Availability Architecture
Jan 27, 2026 · Artificial Intelligence

How LLM Agents Are Redefining Programming: From Manual Coding to Autonomous Agents

The author reflects on a rapid shift in software development workflows driven by LLM agents, highlighting the move from manual coding to agent‑driven automation, the remaining need for IDE oversight, the strengths of tenacity and leverage, and the broader implications for engineers' future roles.

AI programmingAutomationLLM Agents
0 likes · 7 min read
How LLM Agents Are Redefining Programming: From Manual Coding to Autonomous Agents
Architecture and Beyond
Architecture and Beyond
Jan 17, 2026 · Artificial Intelligence

Progressive Disclosure & Dynamic Context: Making LLM Agents Reliable Execution Systems

This article explains how progressive disclosure and dynamic context management address the three core bottlenecks of complex LLM agents—context explosion, tool overload, and uncontrolled execution—by structuring context, tools, and SOPs into layered, token‑efficient, and verifiable workflows.

AI EngineeringLLM AgentsProgressive Disclosure
0 likes · 15 min read
Progressive Disclosure & Dynamic Context: Making LLM Agents Reliable Execution Systems
Tencent Cloud Developer
Tencent Cloud Developer
Dec 23, 2025 · Artificial Intelligence

How ReAct (Reasoning + Acting) Empowers LLM Agents to Solve Real‑World Tasks

This article explains the ReAct paradigm—combining reasoning, action, and observation—to turn large language models into controllable agents, detailing its core concepts, architecture, workflow, code implementation, application scenarios, advantages over other methods, and future research directions.

AI automationLLM Agentsreasoning and acting
0 likes · 29 min read
How ReAct (Reasoning + Acting) Empowers LLM Agents to Solve Real‑World Tasks
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Dec 9, 2025 · Artificial Intelligence

How Do LLM Trading Agents Perform in a Competitive Market Arena?

The paper introduces Agent Market Arena (AMA), a lifelong, real‑time benchmark that evaluates diverse LLM‑based trading agents across crypto and equity markets, revealing that agent architecture, rather than the underlying LLM, drives performance differences and risk‑adjusted returns.

BenchmarkFinancial TradingLLM Agents
0 likes · 11 min read
How Do LLM Trading Agents Perform in a Competitive Market Arena?
PaperAgent
PaperAgent
Dec 9, 2025 · Artificial Intelligence

Agentic AI Unveiled: Dual Paradigms, Architecture Battles, and Future Directions

This comprehensive survey dissects Agentic AI by contrasting symbolic/classical and neural/generative paradigms, mapping 90 peer‑reviewed papers (2018‑2025) through a PRISMA workflow, evaluating architectures, collaboration models, benchmarks, and ethical considerations, and highlighting the emerging need for hybrid systems and standardized evaluation.

Hybrid ArchitectureLLM AgentsPRISMA review
0 likes · 8 min read
Agentic AI Unveiled: Dual Paradigms, Architecture Battles, and Future Directions
BirdNest Tech Talk
BirdNest Tech Talk
Dec 8, 2025 · Artificial Intelligence

How the New PEV Agent Pattern Boosts Reliable LLM Automation in Go

The article introduces the Plan‑Execute‑Verify (PEV) agent pattern added to langgraphgo, explains its three‑stage workflow, core features, configuration, concrete Go examples, implementation details, comparisons with ReAct and Reflection, and discusses best practices, limitations, and trade‑offs for high‑risk automation.

GoLLM AgentsLangGraphGo
0 likes · 9 min read
How the New PEV Agent Pattern Boosts Reliable LLM Automation in Go
PaperAgent
PaperAgent
Dec 1, 2025 · Artificial Intelligence

How Deep Research Turns LLMs into Autonomous AI Scientists

This article surveys the emerging Deep Research (DR) paradigm that upgrades large language models into research agents capable of autonomous planning, multi‑source evidence gathering, memory management, and verifiable long‑form report generation, outlining its stages, core components, training pipeline, and evaluation benchmarks.

AI agentsAI research automationDeep Research
0 likes · 6 min read
How Deep Research Turns LLMs into Autonomous AI Scientists
AI Frontier Lectures
AI Frontier Lectures
Nov 13, 2025 · Artificial Intelligence

How Graphs Empower LLM Agents: A Deep Dive into GLA

This article reviews the IEEE Intelligent Systems survey that introduces Graph‑augmented LLM Agents (GLA), explains how representing plans, memory, tools and multi‑agent interactions as graphs improves reliability, efficiency, interpretability and flexibility, and outlines five key research directions for future development.

Agent CoordinationLLM AgentsMultimodal AI
0 likes · 8 min read
How Graphs Empower LLM Agents: A Deep Dive into GLA
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Nov 7, 2025 · Artificial Intelligence

Introducing LangGraph: A Low‑Level Framework for Building Stateful AI Agents

This article explains why modern LLM‑based applications need agent capabilities, introduces LangGraph’s core features such as stateful execution, graph‑based orchestration, tool integration, human‑in‑the‑loop and multi‑agent support, and provides a step‑by‑step Python example that builds a simple chat‑bot agent.

LLM AgentsLangGraphPython example
0 likes · 11 min read
Introducing LangGraph: A Low‑Level Framework for Building Stateful AI Agents
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 30, 2025 · Artificial Intelligence

FinSearchComp: ByteDance’s Expert‑Level Financial Search and Reasoning Benchmark for Real‑World Scenarios

FinSearchComp is the first fully open‑source benchmark that evaluates large‑language‑model agents' search and reasoning abilities in realistic financial workflows, featuring 635 expert‑annotated questions across three task types, built with 70 finance experts, and revealing that web‑enabled models with financial plugins markedly outperform API‑only models.

AI evaluationBenchmarkFinSearchComp
0 likes · 12 min read
FinSearchComp: ByteDance’s Expert‑Level Financial Search and Reasoning Benchmark for Real‑World Scenarios
DataFunTalk
DataFunTalk
Oct 22, 2025 · Artificial Intelligence

Introducing VitaBench: A Real-World Benchmark for Complex LLM Agents

VitaBench is a newly released, highly realistic benchmark that evaluates large‑language‑model agents across three everyday scenarios—food ordering, restaurant dining, and travel planning—by quantifying reasoning, tool‑use, and interaction complexities, revealing a significant performance gap in current models.

AI evaluationBenchmarkLLM Agents
0 likes · 13 min read
Introducing VitaBench: A Real-World Benchmark for Complex LLM Agents
Data Thinking Notes
Data Thinking Notes
Oct 9, 2025 · Artificial Intelligence

Mastering Context Engineering: Boost LLM Agent Performance

Context Engineering, the evolution beyond Prompt Engineering, optimizes the selection and management of tokens within large language model windows, enabling high‑performance, autonomous AI agents through efficient system prompts, tool design, example selection, dynamic retrieval, compression, structured memory, and multi‑agent architectures.

LLM AgentsMulti-Agent Systemsai-optimization
0 likes · 19 min read
Mastering Context Engineering: Boost LLM Agent Performance
xkx's Tech General Store
xkx's Tech General Store
Sep 10, 2025 · Artificial Intelligence

Exploring WebDancer: Alibaba’s WebAgent that Solves Complex Queries Automatically

This article walks through installing Alibaba's WebDancer agent, explains its SFT‑plus‑RL training pipeline—including data construction, trajectory sampling, supervised fine‑tuning, and reinforcement learning—compares it with the earlier WebWalker, and demonstrates its multi‑step reasoning on a real‑world query.

AI AgentAlibabaLLM Agents
0 likes · 10 min read
Exploring WebDancer: Alibaba’s WebAgent that Solves Complex Queries Automatically
DataFunTalk
DataFunTalk
Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupLLM Agents
0 likes · 4 min read
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents
DataFunSummit
DataFunSummit
Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupLLM Agents
0 likes · 4 min read
How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray
Smart Era Software Development
Smart Era Software Development
Jul 8, 2025 · Artificial Intelligence

12-Factor Agents – Core Principles to Bridge the Demo‑to‑Production Gap for Reliable LLM Apps

The article presents the 12‑Factor Agents framework, adapting the classic 12‑Factor App methodology to large‑language‑model agents and detailing twelve concrete engineering principles—ranging from prompt control and context engineering to human‑in‑the‑loop and stateless design—that together enable production‑grade, observable, and maintainable AI agents.

12-FactorContext ManagementLLM Agents
0 likes · 11 min read
12-Factor Agents – Core Principles to Bridge the Demo‑to‑Production Gap for Reliable LLM Apps
BirdNest Tech Talk
BirdNest Tech Talk
Jun 30, 2025 · Artificial Intelligence

Build a Weather‑Query ReAct Agent with LangGraph: Step‑by‑Step Guide

This article walks through constructing a stateful ReAct‑style LLM agent using LangGraph, detailing the core components—State, Nodes, Edges—defining a weather‑lookup tool with Open‑Meteo, configuring the graph’s nodes and conditional edges, and executing the workflow with streaming to observe each step in real time.

LLM AgentsLangGraphPython
0 likes · 16 min read
Build a Weather‑Query ReAct Agent with LangGraph: Step‑by‑Step Guide
AI Large Model Application Practice
AI Large Model Application Practice
Jun 23, 2025 · Databases

How Google’s MCP Toolbox Simplifies Enterprise Database Access for LLM Agents

This guide explains Google’s open‑source MCP Toolbox for Databases, covering its core concepts, installation, configuration, two usage modes (native SDK and MCP), example LangGraph agent integration, security features, observability, and practical code snippets for building reliable LLM‑driven database tools.

DatabasesLLM AgentsMCP Toolbox
0 likes · 11 min read
How Google’s MCP Toolbox Simplifies Enterprise Database Access for LLM Agents
Instant Consumer Technology Team
Instant Consumer Technology Team
May 29, 2025 · Artificial Intelligence

API vs GUI Agents: How to Choose the Right LLM Automation Approach

This article examines the evolution of large language model agents, contrasting API‑based agents that use predefined function calls with GUI‑based agents that interact with visual interfaces, and explores hybrid strategies, orchestration tools, RAG techniques, and practical guidelines for selecting the optimal paradigm.

API vs GUIHybrid automationLLM Agents
0 likes · 34 min read
API vs GUI Agents: How to Choose the Right LLM Automation Approach