Tagged articles

LLM Agents

113 articles · Page 1 of 2

Jul 4, 2026 · Artificial Intelligence

Iterating Agent Skills with SkillRevise: Using Execution Traces for Continuous Improvement

SkillRevise tackles the overestimation of LLM‑authored agent skills by breaking down complex search tasks, attaching evidence to verifiable sources, and introducing trace‑conditioned revisions that let engineers pinpoint and fix failures across retrieval, reasoning, and presentation layers.

LLM AgentsRAGSkillRevise

0 likes · 14 min read

Iterating Agent Skills with SkillRevise: Using Execution Traces for Continuous Improvement

Alibaba Cloud Infrastructure

Jul 2, 2026 · Artificial Intelligence

Turn Your AI Agent into a Memory Master with the Open‑Source mem0 Layer

mem0 is an open‑source AI memory layer that adds long‑term, cross‑session memory to LLM‑based agents, allowing them to retain user preferences, conversation history, and task progress, reducing token usage and latency while integrating with popular models via simple add/search APIs.

AI memory layerApache 2.0LLM Agents

0 likes · 10 min read

Turn Your AI Agent into a Memory Master with the Open‑Source mem0 Layer

Machine Heart

Jul 2, 2026 · Artificial Intelligence

How AReaL 2.0 Accelerates Self‑Evolving Agents

AReaL 2.0 introduces an online reinforcement‑learning infrastructure that turns real‑world agent interactions into a learning loop, defining three pillars—trajectory data protocol, data proxy, and evolution control plane—to enable agents to not only execute tasks but continuously improve from their own experience.

AReaLAgentic RLLLM Agents

0 likes · 16 min read

How AReaL 2.0 Accelerates Self‑Evolving Agents

Machine Learning Algorithms & Natural Language Processing

Jun 25, 2026 · Artificial Intelligence

AutoResearch Advances: RUC & Microsoft Open‑Source Arbor Gives Agents Research Memory

Arbor, an open‑source autonomous research framework from RUC’s Gaoling AI Institute and Microsoft Research, structures the research loop with a growing hypothesis‑tree and insight back‑propagation, allowing agents to retain hypotheses, evidence, and failures, and achieves the best held‑out results on six real AO tasks, surpassing Codex and Claude Code.

AI research automationArbor frameworkLLM Agents

0 likes · 18 min read

AutoResearch Advances: RUC & Microsoft Open‑Source Arbor Gives Agents Research Memory

Architect

Jun 25, 2026 · Artificial Intelligence

Why a Concise CLAUDE.md Entry File Is Critical for LLM Agents in Your Repo

The article explains how a short, well‑structured CLAUDE.md file injects the minimal yet essential context an LLM coding agent needs before it scans a repository, preventing common mis‑assumptions about tech stack, commands, boundaries, and completion criteria.

AGENTS.mdAI ToolingCLAUDE.md

0 likes · 16 min read

Why a Concise CLAUDE.md Entry File Is Critical for LLM Agents in Your Repo

Machine Heart

Jun 22, 2026 · Artificial Intelligence

Beyond Single-Task Experts: Introducing EEVEE, the First Fully-Evolving Agent Framework

EEVEE is a test‑time prompt‑learning framework that lets LLM agents continuously improve across diverse tasks by co‑evolving a router and specialized prompts, achieving a cumulative +42 gain over many tasks while keeping token usage low and preserving single‑task performance.

EEVEELLM AgentsMulti-Task Learning

0 likes · 10 min read

Beyond Single-Task Experts: Introducing EEVEE, the First Fully-Evolving Agent Framework

PaperAgent

Jun 20, 2026 · Artificial Intelligence

Vertical Domain Agents Gain 88.5% Boost by Adapting the Runtime Interface, Not Retraining

The paper shows that many failures of deterministic LLM agents stem from mismatched model‑environment interfaces, and introduces LIFE‑HARNESS—a four‑layer runtime harness that extracts reusable failure patterns from training trajectories without updating model weights, delivering an average 88.5% relative performance gain across 126 model‑environment settings.

Deterministic AgentsLLM AgentsLife-Harness

0 likes · 8 min read

Vertical Domain Agents Gain 88.5% Boost by Adapting the Runtime Interface, Not Retraining

Machine Heart

Jun 19, 2026 · Artificial Intelligence

Which Multi‑Agent Communication Protocol Wins? UIUC Introduces ProtocolBench at ICML 2026

The UIUC team presents ProtocolBench, a systematic benchmark that compares four multi‑agent communication protocols across four realistic scenarios, revealing distinct trade‑offs in latency, reliability, and security, and proposes ProtocolRouter to automatically select the most suitable protocol per workload.

BenchmarkLLM AgentsMulti-Agent Systems

0 likes · 14 min read

Which Multi‑Agent Communication Protocol Wins? UIUC Introduces ProtocolBench at ICML 2026

Code Mala Tang

Jun 19, 2026 · Artificial Intelligence

Five Skeptical Questions About RTK’s Token Compression Claims

The article critically examines RTK’s token‑compression promises, exposing misleading savings metrics, silent‑failure bugs, missing task‑success benchmarks, its status as a fragile feature rather than a product, and the brittleness of its output parser, before offering concrete guidance on when to use it.

CLI output parsingLLM AgentsRTK

0 likes · 8 min read

Five Skeptical Questions About RTK’s Token Compression Claims

HyperAI Super Neural

Jun 18, 2026 · Artificial Intelligence

Weekly AI Paper Digest: D4RT 300× Faster 4D Reconstruction, SAI Theory Challenges AGI, and More

This week’s AI paper roundup covers DeepMind’s D4RT framework that accelerates dynamic 4D reconstruction by up to 300×, a Columbia‑NYU proposal of Superhuman Adaptable Intelligence that questions AGI, MIT‑UW findings on chatbot delusional spiraling, security risks of autonomous agents, a new ARA protocol for executable research artifacts, a vision of AI‑driven software engineering, and a memory‑caching approach that expands RNN capacity while reducing complexity.

AI safetyArtificial IntelligenceD4RT

0 likes · 11 min read

Weekly AI Paper Digest: D4RT 300× Faster 4D Reconstruction, SAI Theory Challenges AGI, and More

Machine Heart

Jun 17, 2026 · Artificial Intelligence

Why RL‑Trained Agents Still Fail to Reason Actively: The Information Self‑Locking Problem

The paper reveals that outcome‑based reinforcement learning often traps LLM agents in an information self‑locking regime where weak action selection and belief tracking prevent proper credit assignment, and introduces AREW, a lightweight advantage‑reweighting method that restores active reasoning across multiple tasks and models.

AREWAgentic RLLLM Agents

0 likes · 24 min read

Why RL‑Trained Agents Still Fail to Reason Actively: The Information Self‑Locking Problem

PaperAgent

Jun 17, 2026 · Artificial Intelligence

Spatial-Agent: A New Concept‑Transformation Paradigm for Map Agents

The paper introduces Spatial‑Agent, which models geospatial question answering as a concept‑transformation process using a GeoFlow Graph intermediate representation, outlines a five‑step workflow, defines core concepts and functional roles, and demonstrates its effectiveness on MapEval‑API and MapQA benchmarks with detailed error and cost analyses.

BenchmarkGISGeoFlow Graph

0 likes · 13 min read

Spatial-Agent: A New Concept‑Transformation Paradigm for Map Agents

Machine Heart

Jun 15, 2026 · Artificial Intelligence

Breaking the SWE‑bench Score‑Only Myth: Open‑Source Benchmark that Independently Measures Harnesses

The article critiques the reliance on raw SWE‑bench scores for programming agents, introduces the Claw‑SWE‑Bench benchmark and a dedicated adapter that isolates harness effects, and presents extensive experiments showing how model choice, harness design, and cost impact real-world coding performance across multiple languages.

BenchmarkHarnessLLM Agents

0 likes · 14 min read

Breaking the SWE‑bench Score‑Only Myth: Open‑Source Benchmark that Independently Measures Harnesses

Java Tech Enthusiast

Jun 13, 2026 · Artificial Intelligence

Why Bigger 1M‑Token Windows Still Need Careful Context Engineering

Even though modern LLMs like DeepSeek‑V4, GPT‑5.5 and Claude Opus 4.7 support 1 million‑token windows, simply stuffing more data does not improve agent performance; effective Context Engineering—selecting, structuring, and managing the right information—remains essential for reliable results.

LLM AgentsPrompt engineeringRAG

0 likes · 32 min read

Why Bigger 1M‑Token Windows Still Need Careful Context Engineering

Machine Learning Algorithms & Natural Language Processing

Jun 12, 2026 · Artificial Intelligence

The Next Frontier for Large‑Scale LLM Agents: 17 Must‑Read Papers on Self‑Evolving Harnesses

This article surveys 17 recent core papers that explore how the system‑level harness surrounding large‑model agents can be automatically generated, evolved, and audited, covering topics such as system boundaries, failure‑driven improvement, memory and skill optimization, source‑level rewriting, scaling laws, aging, and safety.

Agent MemoryHarness EngineeringLLM Agents

0 likes · 18 min read

The Next Frontier for Large‑Scale LLM Agents: 17 Must‑Read Papers on Self‑Evolving Harnesses

Data Party THU

Jun 11, 2026 · Artificial Intelligence

Boost 18 LLM Agents Without Retraining Using LIFE‑HARNESS

The article introduces LIFE‑HARNESS, a runtime‑interface adaptation framework that keeps model weights unchanged, extracts reusable failure patterns from a single model's training trace, and achieves an average 88.5% relative performance gain across 18 LLM agents and 7 deterministic environments, with successful transfer to 17 other models.

LLM AgentsRuntime Harnessbenchmark evaluation

0 likes · 8 min read

Boost 18 LLM Agents Without Retraining Using LIFE‑HARNESS

Network Intelligence Research Center (NIRC)

Jun 11, 2026 · Artificial Intelligence

Scaling Automated Formalization of Mathematics: Inside Meta’s AutoformBot and the ATLAS Lean 4 Library

Meta’s recent paper presents AutoformBot, a multi‑agent system that treats formalizing entire mathematics textbooks as a large‑scale software‑engineering project, generating the ATLAS Lean 4 library with over 45,000 declarations and demonstrating a 71 % success rate across 26 open‑access books.

AutoformBotLLM AgentsLean 4

0 likes · 14 min read

Scaling Automated Formalization of Mathematics: Inside Meta’s AutoformBot and the ATLAS Lean 4 Library

Machine Learning Algorithms & Natural Language Processing

Jun 8, 2026 · Artificial Intelligence

Re‑evaluating the Token World of LLM Agents: A Dual‑View Economics Overview

The paper surveys the rapid growth of token consumption in LLM agents, proposes a dual‑view Token Economics framework that treats tokens as production factors, exchange media, and accounting units, and classifies optimization challenges from single‑agent efficiency to ecosystem‑level pricing, security, and future research directions.

AI Resource ManagementLLM AgentsMulti-Agent Systems

0 likes · 10 min read

Re‑evaluating the Token World of LLM Agents: A Dual‑View Economics Overview

Alibaba Cloud Native

Jun 8, 2026 · Artificial Intelligence

Code Harness vs. Model-Driven Harness: Can Agent Control Be Expressed as Executable Natural Language?

The article reviews the "Natural-Language Agent Harnesses" paper, explains the distinction between code, middleware, and harness layers for LLM agents, introduces NLAH and IHR concepts, and details experimental evaluations that show natural‑language harnesses can match code‑based control while exposing new trade‑offs and risks.

Intelligent Harness RuntimeLLM AgentsModule Ablation

0 likes · 13 min read

Code Harness vs. Model-Driven Harness: Can Agent Control Be Expressed as Executable Natural Language?

James' Growth Diary

Jun 1, 2026 · Artificial Intelligence

How Hermes Implements Bounded Memory: Character Limits, Compression, and Snapshots to Prevent Overflow

The article details Hermes' bounded memory system, which uses character limits for persistent files, a three‑stage context compression pipeline, boundary alignment to protect tool calls, snapshot caching, triple redaction, and anti‑thrashing mechanisms, ensuring agents never overflow or lose critical information.

HermesLLM AgentsMemory Management

0 likes · 16 min read

How Hermes Implements Bounded Memory: Character Limits, Compression, and Snapshots to Prevent Overflow

DataFunTalk

Jun 1, 2026 · Artificial Intelligence

Rethinking Agent Harness: Toward State‑Aware Runtime for Reliable LLM Agents

The article argues that improving large‑model agents requires more than bigger models or longer context windows; it calls for a stable, auditable, and recoverable runtime that manages state transitions, prevents error propagation, and enables trace‑native evaluation of long‑running agents.

Agent HarnessLLM AgentsReliability

0 likes · 13 min read

Rethinking Agent Harness: Toward State‑Aware Runtime for Reliable LLM Agents

ITPUB

May 30, 2026 · Artificial Intelligence

Is RAG Dead? How Grep Is Making a Comeback in LLM‑Powered Code Search

This article investigates the claim that Retrieval‑Augmented Generation (RAG) is obsolete by dissecting Claude Code’s grep‑driven search architecture, benchmarking its performance against traditional vector‑based retrieval, comparing it with Cursor and OpenAI Codex, and analyzing the trade‑offs of multi‑round agentic search.

Claude CodeCode searchCursor

0 likes · 36 min read

Is RAG Dead? How Grep Is Making a Comeback in LLM‑Powered Code Search

Linyb Geek Road

May 29, 2026 · Artificial Intelligence

Agent Harness Architecture Deep Dive: From ReAct Loop to Production‑Grade AI System Design

The article argues that the real performance bottleneck of AI agents lies in the Agent Harness infrastructure rather than the model itself, and it systematically explains how prompt, context, and infrastructure layers, tool handling, memory, verification, error handling, and design trade‑offs shape production‑ready LLM agents.

AI InfrastructureAgent HarnessContext Management

0 likes · 24 min read

Agent Harness Architecture Deep Dive: From ReAct Loop to Production‑Grade AI System Design

Code Mala Tang

May 28, 2026 · Artificial Intelligence

When Claude Skills Need Determinism, Use Skillflows

The article analyzes Claude's natural‑language SKILL.md approach, highlights its flexibility and nondeterminism, and explains how adding a declarative skillflow.json graph enforces deterministic execution, auditability, lower token cost, and better consistency for high‑frequency, compliance‑critical tasks.

ClaudeLLM AgentsSkillflows

0 likes · 11 min read

When Claude Skills Need Determinism, Use Skillflows

ShiZhen AI

May 27, 2026 · Artificial Intelligence

Turning Click‑Based Web Agents into Repeatable Scripts with Microsoft’s Open‑Source Webwright

Microsoft’s open‑source Webwright framework redefines browser agents by replacing step‑by‑step click actions with generated Playwright scripts, enabling repeatable, debuggable web tasks; the article details its architecture, workflow, benchmark results on Online‑Mind2Web and Odysseys, and discusses practical benefits and limitations.

BenchmarkGPT-5.4LLM Agents

0 likes · 9 min read

Turning Click‑Based Web Agents into Repeatable Scripts with Microsoft’s Open‑Source Webwright

AI Engineering

May 26, 2026 · Artificial Intelligence

Training Only the Skill Document While Keeping Model Weights Frozen (SkillOpt)

Microsoft Research introduces SkillOpt, a method that freezes large‑model weights and instead trains a natural‑language skill document as the sole learnable parameter, using a rollout‑reflect‑edit‑gate loop, achieving optimal results across 52 benchmark‑model‑environment combinations and demonstrating strong transferability.

LLM AgentsSkillOptbenchmark evaluation

0 likes · 9 min read

Training Only the Skill Document While Keeping Model Weights Frozen (SkillOpt)

Code Mala Tang

May 25, 2026 · Artificial Intelligence

Behind 95K Stars: browser-use’s LLM Browser Automation vs Playwright

browser-use, an open‑source MIT‑licensed LLM agent loop that compresses page DOM into an indexed list of interactive elements, lets large language models plan and execute web tasks, and is compared against Anthropic’s Computer Use, OpenAI’s Operator and traditional Playwright/Selenium, highlighting its flexibility, lower cost, but higher LLM usage and deployment trade‑offs.

Anthropic Computer UseLLM AgentsMIT license

0 likes · 16 min read

Behind 95K Stars: browser-use’s LLM Browser Automation vs Playwright

HyperAI Super Neural

May 25, 2026 · Artificial Intelligence

CVEvolve: Zero‑Code Autonomous Discovery of Scientific Image‑Processing Algorithms

CVEvolve, a no‑code autonomous agent framework from ANL, leverages large‑language‑model agents to discover, evaluate, and iterate scientific image‑processing algorithms without any programming, and demonstrates superior performance on X‑ray fluorescence registration, Bragg‑peak detection, and diffraction‑image segmentation compared with traditional baselines.

CVEvolveImage processingLLM Agents

0 likes · 13 min read

CVEvolve: Zero‑Code Autonomous Discovery of Scientific Image‑Processing Algorithms

PaperAgent

May 25, 2026 · Artificial Intelligence

DeepSeek’s Harness: How Agent Harness Engineering Is Shaping the Next LLM Agent Era

The article surveys DeepSeek’s Harness initiative, presenting the Binding‑Constraint Thesis, three‑stage evolution from prompt to harness engineering, the ETCLOVG seven‑layer architecture, and concrete benchmark evidence that harness‑only improvements far outweigh model upgrades, while detailing security, observability, and governance considerations for reliable LLM agents.

AI ArchitectureAgent Harness EngineeringAgent evaluation

0 likes · 12 min read

DeepSeek’s Harness: How Agent Harness Engineering Is Shaping the Next LLM Agent Era

James' Growth Diary

May 24, 2026 · Artificial Intelligence

Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop

The article dissects Hermes' background review mechanism, showing how a silent daemon thread performs post‑conversation reflection, writes valuable insights to a skill or memory store, shares prompt designs, fork‑agent isolation, priority update rules, and common pitfalls for building continuously learning LLM agents.

Background ReviewDaemon ThreadHermes

0 likes · 14 min read

Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop

Old Zhang's AI Learning

May 21, 2026 · Artificial Intelligence

SkillOS: Enabling Agents to Self‑Manage Their Skills

SkillOS reframes skill management for LLM agents as a long‑horizon reinforcement‑learning problem, letting a trainable Skill Curator automatically insert, update, or delete markdown‑based skills, which the frozen Agent Executor then consumes, improving memory‑free performance and cross‑task transfer.

LLM AgentsMarkdownSelf-Evolving Agents

0 likes · 6 min read

SkillOS: Enabling Agents to Self‑Manage Their Skills

PaperAgent

May 20, 2026 · Artificial Intelligence

AutoTTS Shows How AI Agents Can Outperform Human‑Designed Test‑Time Scaling Strategies

The paper “LLMs Improving LLMs” introduces AutoTTS, an environment where a Claude‑based explorer agent automatically searches test‑time scaling policies, achieving up to 69.5% token savings and superior accuracy on unseen models, all for $39.9 and 160 minutes without any LLM calls during evaluation.

AutoTTSClaudeLLM Agents

0 likes · 7 min read

AutoTTS Shows How AI Agents Can Outperform Human‑Designed Test‑Time Scaling Strategies

Data Party THU

May 18, 2026 · Artificial Intelligence

How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking

VIGIL introduces a verify‑before‑commit framework that isolates tool‑stream injection attacks on LLM agents, using intent anchoring, perception sanitization, speculative reasoning, grounding verification, and validated trajectory memory, reducing attack success rates to 8‑12% while preserving task utility.

AI safetyLLM AgentsSIREN benchmark

0 likes · 11 min read

How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking

LuTiao Programming

May 17, 2026 · Artificial Intelligence

Why Your AI Keeps Going Off‑Track: The 4 Essential CLAUDE.md Directives

The article analyzes why AI coding assistants often stray from intended requirements, exposing a core judgment deficit, and shows how a concise four‑line CLAUDE.md file—detailing assumptions, minimal code, scoped changes, and verifiable success criteria—can dramatically improve AI behavior, reduce over‑design, and lower review costs.

AI codingCLAUDE.mdLLM Agents

0 likes · 11 min read

Why Your AI Keeps Going Off‑Track: The 4 Essential CLAUDE.md Directives

dbaplus Community

May 17, 2026 · Artificial Intelligence

Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving

The article dissects Claude Code’s LLM‑driven Grep search, showing how multi‑round tool calls replace static vector‑based RAG, presents ripgrep performance benchmarks, compares Claude Code with Cursor and Codex, and argues that zero‑index search is optimal for local code bases while larger projects still need indexing.

Claude CodeCode searchLLM Agents

0 likes · 36 min read

Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving

AI Engineering

May 17, 2026 · Information Security

LiteLLM Agent Platform: K8s Sandbox Stops Agents Accessing Real API Keys

The open‑source LiteLLM Agent Platform isolates each coding agent in a fresh Kubernetes pod and swaps stub tokens for real credentials only on outbound TLS requests, preventing any agent from ever seeing or leaking actual API keys.

API SecurityKubernetesLLM Agents

0 likes · 4 min read

LiteLLM Agent Platform: K8s Sandbox Stops Agents Accessing Real API Keys

Architect

May 12, 2026 · Artificial Intelligence

Why Does Past Information Influence Future Decisions? Analyzing Agent Memory Architecture

The article dissects Agent Memory, explaining how past observations are written, managed, and read to affect future tasks, highlighting challenges such as relevance, decay, conflict, security, and offering practical design guidelines and architectural options for production‑grade AI agents.

AI ArchitectureAgent MemoryLLM Agents

0 likes · 31 min read

Why Does Past Information Influence Future Decisions? Analyzing Agent Memory Architecture

PaperAgent

May 11, 2026 · Artificial Intelligence

SkillOS: How Skill Governance Powers Self‑Evolving AI Agents

SkillOS addresses the one‑off nature of current LLM agents by introducing a closed‑loop system where a trainable Skill Curator continuously extracts, updates, and manages reusable skills from execution traces, leading to measurable gains in success rates, efficiency, and cross‑task generalization.

Grouped Task StreamsLLM AgentsMeta-Strategy Skills

0 likes · 10 min read

SkillOS: How Skill Governance Powers Self‑Evolving AI Agents

Linyb Geek Road

May 10, 2026 · Artificial Intelligence

Designing Progressive Large‑Model Agents: Architecture, Frameworks, and Real‑World Practices

This article examines the evolution of large‑model agents, outlines four development stages, compares workflow, collaborative, and evolutionary frameworks, details core components such as perception, memory, planning, tools, and reflection, and explains how a progressive, loop‑based architecture can be applied across verticals like research, code generation, and complex workflow automation.

AlphaEvolveLLM AgentsLangGraph

0 likes · 9 min read

Designing Progressive Large‑Model Agents: Architecture, Frameworks, and Real‑World Practices

Machine Learning Algorithms & Natural Language Processing

May 8, 2026 · Artificial Intelligence

T²PO: Uncertainty‑Guided Exploration Control for Stable Multi‑Turn Agent RL

The paper identifies inefficient exploration, termed "hesitation," as the root cause of instability in multi‑turn reinforcement learning for LLM agents and introduces T²PO, an uncertainty‑driven token‑ and turn‑level control framework that markedly improves training stability and performance on benchmarks such as WebShop, ALFWorld, and Search QA.

LLM AgentsT2POUncertainty

0 likes · 16 min read

T²PO: Uncertainty‑Guided Exploration Control for Stable Multi‑Turn Agent RL

PaperAgent

May 4, 2026 · Artificial Intelligence

A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution

This survey systematically reviews self‑evolving agents, explains why autonomous agents are needed, proposes a unified taxonomy of three evolution paradigms, analyzes model‑centric, environment‑centric, and co‑evolution approaches, and outlines future challenges in designing adaptive environments.

AI Agent TaxonomyCo-EvolutionEnvironment-Centric Evolution

0 likes · 14 min read

A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution

AI Engineer Programming

May 3, 2026 · Artificial Intelligence

From Single Retrieval to Autonomous Reasoning: Understanding Agentic RAG

The article analyzes why traditional Retrieval‑Augmented Generation fails on multi‑hop, vague, or multi‑source queries and explains how Agentic RAG uses an LLM‑driven agent loop to make dynamic retrieval decisions, outlining its architecture, suitable scenarios, and limitations.

AI reasoningAgentic RAGLLM Agents

0 likes · 7 min read

From Single Retrieval to Autonomous Reasoning: Understanding Agentic RAG

AI Tech Publishing

May 1, 2026 · Artificial Intelligence

5 Counterintuitive Design Principles for Prompt Caching in Claude Code

The article details five counterintuitive design principles for Claude Code's prompt caching—optimizing prompt layout, using message‑based updates, never switching models or tools mid‑conversation, safely compressing context, and monitoring cache health—backed by concrete examples and up to 90% cost savings.

AI EngineeringCache OptimizationClaude Code

0 likes · 10 min read

5 Counterintuitive Design Principles for Prompt Caching in Claude Code

AI Explorer

Apr 30, 2026 · Industry Insights

AI Tech Daily: Key AI Industry Highlights for April 30 2026

The AI Tech Daily roundup highlights Microsoft's 123% AI revenue surge, groundbreaking GPT‑5.5 restrictions, DeepSeek's multimodal launch, Ant Group's zkDTVM benchmark record, a 23‑year‑old Linux kernel bug, Stripe's 288 AI‑focused features, and emerging trends in LLM agent orchestration and AI adoption metrics.

AI revenueDeepSeekGPT-5.5

0 likes · 4 min read

AI Tech Daily: Key AI Industry Highlights for April 30 2026

SuanNi

Apr 27, 2026 · Artificial Intelligence

How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy

The paper shows that conventional LLM agents fail on real‑world enterprise data because of chaotic data sources, while the RUBICON architecture uses a minimal Agentic Query Language to let users direct data retrieval, achieving 100% accuracy with a much cheaper model and dramatically lower token and monetary costs.

Agentic Query LanguageBenchmarkData Integration

0 likes · 11 min read

How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy

AI Architecture Hub

Apr 23, 2026 · Artificial Intelligence

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Prompt caching, a prefix‑matching technique that reuses prior LLM interactions, proved essential for Claude Code’s low latency and cost, and the article details counter‑intuitive practices such as arranging static prompts first, updating info via messages, avoiding mid‑session model or tool changes, and ensuring cache‑safe context forks.

AI EngineeringCache OptimizationClaude Code

0 likes · 10 min read

Why Prompt Caching Is Critical: Lessons from Building Claude Code

AI Waka

Apr 22, 2026 · Artificial Intelligence

Hybrid MCP‑Skill Model: Keeping LLM Agent Skills Fresh

The article analyzes the trade‑offs between packaging new agent functionality as a static Skill versus a dynamic MCP server, proposes a hybrid thin‑CLI approach that combines the ease of Skills with the up‑to‑date guarantees of MCP, and illustrates the design with concrete code examples.

CLI wrapperHybrid ArchitectureLLM Agents

0 likes · 7 min read

Hybrid MCP‑Skill Model: Keeping LLM Agent Skills Fresh

PaperAgent

Apr 22, 2026 · Artificial Intelligence

How SkillClaw Enables Collective Evolution of Agent Skills in Real-World Use

SkillClaw introduces a centralized evolution framework that transforms user interactions into structured evidence, allowing LLM agents to refine, create, or skip skills based on aggregated success and failure patterns, with nightly validation ensuring only proven improvements are deployed, resulting in consistent performance gains across diverse tasks.

AI workflowBenchmarkLLM Agents

0 likes · 13 min read

How SkillClaw Enables Collective Evolution of Agent Skills in Real-World Use

AntTech

Apr 22, 2026 · Artificial Intelligence

How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents

This article reviews two recent ICLR 2026 papers—M²‑Miner, a multi‑agent Monte‑Carlo Tree Search framework for low‑cost mobile GUI data mining, and IGPO, an information‑gain‑based reinforcement‑learning method that provides dense rewards for multi‑turn search agents—detailing their designs, experiments, and open‑source releases.

GUI Data MiningInformation GainLLM Agents

0 likes · 8 min read

How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents

Machine Heart

Apr 21, 2026 · Artificial Intelligence

How Externalization Drives the Evolution of LLM Agents – Insights from a 54‑Page SJTU Review

A recent 54‑page arXiv review by Shanghai Jiao Tong University and collaborators argues that the reliability gains of LLM agents stem more from externalizing memory, skills, protocols, and harness infrastructure than from scaling the underlying model, outlining three structural mismatches and a unified externalization framework.

ExternalizationHarnessLLM Agents

0 likes · 13 min read

How Externalization Drives the Evolution of LLM Agents – Insights from a 54‑Page SJTU Review

SuanNi

Apr 19, 2026 · Artificial Intelligence

Why External Cognition Is the New Engine Behind Reliable LLM Agents

The article analyzes how the success of large‑language‑model agents now hinges on external cognitive infrastructure—memory, skills, protocols, and a central Harness—rather than raw model parameters, outlining architectural evolution, practical challenges, and emerging industry trends.

AI industry trendsHarness frameworkLLM Agents

0 likes · 15 min read

Why External Cognition Is the New Engine Behind Reliable LLM Agents

AI Architecture Hub

Apr 18, 2026 · Artificial Intelligence

Build a Dual‑Layer AI Knowledge Base in 20 Minutes and Supercharge Your LLM Agents

This article explains how to create a two‑layer AI knowledge system— a dynamic Knowledge Base Layer and a static Brand Foundation Layer— in about 20 minutes, detailing its architecture, advantages over traditional RAG, step‑by‑step deployment, and real‑world use cases for creators, teams, and personal productivity.

AI knowledge baseGitKnowledge Management

0 likes · 16 min read

Build a Dual‑Layer AI Knowledge Base in 20 Minutes and Supercharge Your LLM Agents

AI Waka

Apr 17, 2026 · Artificial Intelligence

From Generative to Agentic AI: Building Real‑World Agent Systems

The article explains how AI is shifting from reactive generative models to goal‑driven Agentic systems, outlines core framework components, common patterns, skill abstractions, a step‑by‑step implementation guide for backend engineers, and introduces Harness Engineering for production‑grade reliability and observability.

AI frameworksLLM AgentsObservability

0 likes · 10 min read

From Generative to Agentic AI: Building Real‑World Agent Systems

Linyb Geek Road

Apr 16, 2026 · Artificial Intelligence

Does Conway's Law Apply to LLM Agent Systems? Design Insights and Best Practices

The article explores how Conway's Law—"organizations design systems that mirror their structure"—extends to large‑model agent architectures, offering concrete examples, role‑alignment strategies, concise communication patterns, and cautions against over‑engineering to improve multi‑agent collaboration.

AI CoordinationAgent System DesignConway's Law

0 likes · 9 min read

Does Conway's Law Apply to LLM Agent Systems? Design Insights and Best Practices

Baidu Geek Talk

Apr 15, 2026 · Artificial Intelligence

Unveiling Claude Code: How Rules, MCP, and Skills Power the Coding Agent

This article dissects the leaked Claude Code v2.1.88 source to reveal how the three core concepts—Rules, MCP, and Skills—are implemented, where they are injected in the Anthropic LLM API request, and when developers should prefer each mechanism for reliable, secure, and token‑efficient coding agent workflows.

Claude CodeLLM AgentsMCP

0 likes · 25 min read

Unveiling Claude Code: How Rules, MCP, and Skills Power the Coding Agent

AI Engineer Programming

Apr 15, 2026 · Artificial Intelligence

Elephant Alpha: Free 100B‑Parameter Instant Model with 256K Context on OpenRouter

OpenRouter quietly launched Elephant Alpha, a free 100B‑parameter LLM with a 256K token window, positioned as an "instant model" that prioritises token efficiency and speed, supports function calling and prompt caching, and is compared against other Animal‑series models while community speculation surrounds its origin.

256K contextElephant AlphaFunction Calling

0 likes · 6 min read

Elephant Alpha: Free 100B‑Parameter Instant Model with 256K Context on OpenRouter

dbaplus Community

Apr 13, 2026 · Artificial Intelligence

Why OpenClaw’s Memory Fails and How to Fix It: 5 Root Causes & Practical Solutions

The article analyses OpenClaw’s memory architecture, identifies five fundamental reasons why the agent forgets or ignores rules, and presents four configuration tweaks plus a self‑improving‑agent approach to make memory writes reliable and behavior enforcement more probable.

AI memoryLLM AgentsOpenClaw

0 likes · 15 min read

Why OpenClaw’s Memory Fails and How to Fix It: 5 Root Causes & Practical Solutions

AI Tech Publishing

Apr 12, 2026 · Artificial Intelligence

How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store

The article dissects Hermes Agent’s four‑store memory architecture—declarative, procedural, situational, and persona—deterministic routing, frozen snapshots, nudge‑driven persistence, security scanning, dual‑peer modeling, skill management, and three‑phase context compression, showing why it outperforms OpenClaw’s breadth‑first design.

Hermes AgentLLM AgentsMemory Architecture

0 likes · 17 min read

How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store

Machine Learning Algorithms & Natural Language Processing

Apr 10, 2026 · Artificial Intelligence

One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing

PaperOrchestra, Google’s multi‑agent framework, turns raw experiment logs, brief ideas, LaTeX templates and conference guidelines into fully formatted CVPR/ICLR papers, using five coordinated agents, Semantic Scholar verification, PaperBanana figure generation, and a refinement loop that boosts simulated acceptance rates by up to 22% while running in under 40 minutes.

BenchmarkLLM AgentsPaperBanana

0 likes · 9 min read

One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing

AI Engineering

Apr 10, 2026 · Artificial Intelligence

Getting Started with Hermes Agent: A Complete Beginner’s Guide

Hermes Agent, the open‑source LLM‑driven framework from Nous Research, has attracted 43.7K GitHub stars, but its documentation leaves many developers stranded; a community‑curated ecosystem map and the “Orange Book” guide now provide step‑by‑step installation, skill development, multi‑agent orchestration, and deployment resources to bridge the gap.

Documentation guideEcosystem mapHermes Agent

0 likes · 5 min read

Getting Started with Hermes Agent: A Complete Beginner’s Guide

AI Step-by-Step

Apr 8, 2026 · Operations

How to Light Up the Black Box of LLM Agents with Full‑Stack Observability

The article explains why traditional logs are insufficient for LLM agents, outlines five observability dimensions—tracing, metrics, behavioral governance, state & memory, and evaluation—and provides concrete, open‑source‑based steps to instrument, monitor, and act on agent workloads in production.

Behavioral GovernanceEvaluationLLM Agents

0 likes · 11 min read

How to Light Up the Black Box of LLM Agents with Full‑Stack Observability

AgentGuide

Apr 2, 2026 · Artificial Intelligence

Understanding ReAct: The Reason‑Act Loop Behind LLM Agents

The article explains ReAct—a Reason‑Act framework for large language model agents that observes, reasons, takes actions via tools, receives feedback, and iterates—highlighting its distinction from plain QA, its step‑by‑step workflow, practical importance, and a weather‑query example.

AI workflowLLM AgentsReAct

0 likes · 5 min read

Understanding ReAct: The Reason‑Act Loop Behind LLM Agents

AI Step-by-Step

Mar 30, 2026 · Artificial Intelligence

How to Keep LLM Agents in Check with Guardrails

The article explains why LLM agents can over‑promise or execute unauthorized actions, and outlines a three‑layer guardrail system—prompt review, output validation, and tool‑action interception—plus concrete rules, examples, and test cases to ensure safe deployment.

AI safetyGuardrailsLLM Agents

0 likes · 11 min read

How to Keep LLM Agents in Check with Guardrails

DevOps Coach

Mar 27, 2026 · Operations

Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?

An experiment with four LLM‑driven autonomous agents—Architect, Builder, Security Sentinel, and QA Tester—attempted to provision a Proxmox‑based HA Kubernetes cluster using real hardware, revealing costly context drift, emergent coordination failures, and stark differences between Gemini and Claude in diagnosing infrastructure‑as‑code errors.

AI OpsAnsibleAutonomous SRE

0 likes · 14 min read

Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?

DeepHub IMBA

Mar 26, 2026 · Artificial Intelligence

Information Access vs. Reasoning: Experimental Attribution Analysis of LLM Agent Performance

The study shows that LLM agents' apparent intelligence stems more from the amount and type of context they can access than from genuine reasoning ability, as demonstrated by the ContextEval framework’s controlled experiments across multiple hyper‑parameter optimization benchmarks.

AI evaluationLLM Agentsagentic workflows

0 likes · 8 min read

Information Access vs. Reasoning: Experimental Attribution Analysis of LLM Agent Performance

Frontend AI Walk

Mar 25, 2026 · Artificial Intelligence

Slow Learning Agents: 7 Cognitive Shifts from Using ChatGPT to Truly Understanding Agents

The article outlines seven essential mindset transitions for building robust LLM agents—recognizing agents as autonomous decision loops, prioritizing harness over model size, layering context, designing tools for agent goals, structuring multi‑layer memory, coordinating multiple agents with isolation and protocols, and aligning evaluation with the real environment.

Context ManagementEvaluationHarness

0 likes · 16 min read

Slow Learning Agents: 7 Cognitive Shifts from Using ChatGPT to Truly Understanding Agents

AI Architecture Hub

Mar 25, 2026 · Artificial Intelligence

How Memento-Skills Enables Continuous Learning for Frozen LLM Agents

The article analyzes the limitations of frozen LLM agents—fixed parameters, loss of state, and costly fine‑tuning—and introduces the Memento‑Skills framework, which adds an external, evolvable skill memory to achieve deployment‑time learning, detailed architecture, optimization knobs, and strong experimental gains.

AI researchDeployment-Time LearningLLM Agents

0 likes · 14 min read

How Memento-Skills Enables Continuous Learning for Frozen LLM Agents

Tencent Cloud Developer

Mar 24, 2026 · Artificial Intelligence

Why AI Coding Agents Miss the Mark—and How to Make Them Work

The article analyzes the hype around AI coding tools like OpenClaw, exposing false demands, the pitfalls of building agents before real needs, the quality gaps in AI‑generated code, and practical strategies such as spec‑first coding, bottleneck identification, and multi‑model orchestration to improve productivity.

AI codingLLM AgentsSpec Coding

0 likes · 15 min read

Why AI Coding Agents Miss the Mark—and How to Make Them Work

Machine Learning Algorithms & Natural Language Processing

Mar 19, 2026 · Artificial Intelligence

From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning

The article analyzes the RETROAGENT framework, showing how its dual intrinsic feedback and memory‑buffer mechanisms enable LLM agents to move beyond solving tasks toward continual evolution, and presents benchmark results that demonstrate significant performance gains and strong test‑time adaptation across four challenging environments.

LLM AgentsRETROAGENTdual intrinsic feedback

0 likes · 7 min read

From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning

Software Engineering 3.0 Era

Mar 15, 2026 · Artificial Intelligence

When AI ‘Crayfish’ Takes Over Testing, Where Do 80% of Testers Go?

The article demonstrates how an LLM‑powered agent (nicknamed “crayfish”) equipped with OpenClaw and Playwright MCP can autonomously perform web‑testing tasks—handling environment setup, visual OCR, error recovery and reporting—showing a shift from fragile scripted automation to intent‑driven testing and warning that traditional test engineers have little time left to adapt.

AI testingLLM AgentsPlaywright

0 likes · 11 min read

When AI ‘Crayfish’ Takes Over Testing, Where Do 80% of Testers Go?

DeepHub IMBA

Mar 14, 2026 · Artificial Intelligence

Three Proven Multi‑Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm

The article explains why single LLM agents often fail due to context overload, role confusion, and fault propagation, then details three reliable orchestration patterns—Supervisor, Pipeline, and Swarm—along with concrete code examples, communication schemas, error‑handling layers, cost and latency considerations, and best‑practice recommendations for production deployment.

Distributed TracingLLM AgentsMulti-Agent Systems

0 likes · 15 min read

Three Proven Multi‑Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm

Architect

Mar 11, 2026 · Artificial Intelligence

How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery

This article explains OpenClaw's sophisticated context‑management system, detailing its three‑layer approach to pruning old turns, trimming tool results, and handling oversized outputs, while preserving critical state through memory flushing, structured compaction, and a robust overflow‑recovery pipeline.

LLM Agentscompressionmemory persistence

0 likes · 29 min read

How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery

AI Explorer

Mar 6, 2026 · Artificial Intelligence

AReaL: Lightning‑Fast Asynchronous RL Engine for Building High‑Performance LLM Agents

AReaL, an open‑source, fully asynchronous reinforcement‑learning platform co‑developed by Tsinghua University and Ant Group, dramatically speeds up training of complex LLM agents, offering a simple, stable, and hardware‑flexible solution for developers seeking industrial‑grade AI agents.

AI InfrastructureAReaLAsynchronous Training

0 likes · 7 min read

AReaL: Lightning‑Fast Asynchronous RL Engine for Building High‑Performance LLM Agents

Woodpecker Software Testing

Mar 5, 2026 · Artificial Intelligence

AI Agent Testing: An In-Depth Guide Every Test Expert Needs

The article explains why traditional assertion‑based testing fails for LLM‑driven AI agents and introduces a four‑dimensional GBRT framework—Goal, Behavior, Resilience, Traceability—detailing concrete examples, evaluation methods, toolchain integration, and practical steps to build measurable, robust test pipelines for autonomous agents.

AI testingGBRTLLM Agents

0 likes · 9 min read

AI Agent Testing: An In-Depth Guide Every Test Expert Needs

PaperAgent

Mar 2, 2026 · Artificial Intelligence

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.

LLM AgentsSKILLRLhierarchical skill bank

0 likes · 10 min read

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

Baobao Algorithm Notes

Mar 2, 2026 · Artificial Intelligence

Why Agentic AI Is Winning Over Workflows: The 2025 Evolution of LLM Agents

The article reviews the rapid shift in 2025 from complex workflow‑based LLM orchestration to streamlined agentic systems that rely on simple prompt loops, sandboxed tool execution, file‑based memory, and modular skill files, culminating in the rise of Agent Harness runtimes.

AI trendsLLM AgentsMemory Management

0 likes · 8 min read

Why Agentic AI Is Winning Over Workflows: The 2025 Evolution of LLM Agents

AI Tech Publishing

Mar 2, 2026 · Artificial Intelligence

Why pi-mono’s Agent Design Is an Anti‑Pattern (and What Works Better)

The author explains why Claude Code became too bloated, outlines the minimal, controllable requirements for a code‑assistant, details pi-mono’s four‑package architecture, shares design anti‑patterns, and presents benchmark results showing its simple approach outperforms heavier agents.

Agent DesignBenchmarkClaude Opus

0 likes · 13 min read

Why pi-mono’s Agent Design Is an Anti‑Pattern (and What Works Better)

AI Waka

Feb 27, 2026 · Artificial Intelligence

How to Add Persistent Long‑Term Memory to LangGraph Agents with Trustcall

This article explains how to integrate durable long‑term memory into LangGraph agents, covering memory types, their coordination, limitations of native LangGraph storage, and a step‑by‑step implementation using Trustcall’s schema‑driven extractors for both user profiles and paper collections.

AILLM AgentsLangGraph

0 likes · 16 min read

How to Add Persistent Long‑Term Memory to LangGraph Agents with Trustcall

Architect

Feb 13, 2026 · Artificial Intelligence

Cutting Agent Costs: Practical Tips from the ‘Toward Efficient Agents’ Survey

The article analyzes why autonomous LLM agents become expensive, breaks down their cost components, and presents concrete engineering strategies—memory management, tool‑call optimization, and planning constraints—to dramatically reduce token usage and improve reliability while maintaining performance.

LLM AgentsPlanningcost optimization

0 likes · 19 min read

Cutting Agent Costs: Practical Tips from the ‘Toward Efficient Agents’ Survey

PaperAgent

Jan 28, 2026 · Artificial Intelligence

How Clawdbot Achieves Persistent, Local Memory for LLM Agents

Clawdbot implements a fully local, persistent memory system for LLM agents by storing context and long‑term knowledge in editable Markdown files, indexing them with SQLite‑vec and FTS5, supporting multi‑agent isolation, compression, pruning, and configurable session lifecycles to maintain efficient, cost‑effective interactions.

LLM Agentscontext compressionlocal storage

0 likes · 13 min read

How Clawdbot Achieves Persistent, Local Memory for LLM Agents

High Availability Architecture

Jan 27, 2026 · Artificial Intelligence

How LLM Agents Are Redefining Programming: From Manual Coding to Autonomous Agents

The author reflects on a rapid shift in software development workflows driven by LLM agents, highlighting the move from manual coding to agent‑driven automation, the remaining need for IDE oversight, the strengths of tenacity and leverage, and the broader implications for engineers' future roles.

AI programmingAutomationLLM Agents

0 likes · 7 min read

How LLM Agents Are Redefining Programming: From Manual Coding to Autonomous Agents

Architecture and Beyond

Jan 17, 2026 · Artificial Intelligence

Progressive Disclosure & Dynamic Context: Making LLM Agents Reliable Execution Systems

This article explains how progressive disclosure and dynamic context management address the three core bottlenecks of complex LLM agents—context explosion, tool overload, and uncontrolled execution—by structuring context, tools, and SOPs into layered, token‑efficient, and verifiable workflows.

AI EngineeringLLM AgentsProgressive Disclosure

0 likes · 15 min read

Progressive Disclosure & Dynamic Context: Making LLM Agents Reliable Execution Systems

Tencent Cloud Developer

Dec 23, 2025 · Artificial Intelligence

How ReAct (Reasoning + Acting) Empowers LLM Agents to Solve Real‑World Tasks

This article explains the ReAct paradigm—combining reasoning, action, and observation—to turn large language models into controllable agents, detailing its core concepts, architecture, workflow, code implementation, application scenarios, advantages over other methods, and future research directions.

AI automationLLM Agentsreasoning and acting

0 likes · 29 min read

How ReAct (Reasoning + Acting) Empowers LLM Agents to Solve Real‑World Tasks

Bighead's Algorithm Notes

Dec 9, 2025 · Artificial Intelligence

How Do LLM Trading Agents Perform in a Competitive Market Arena?

The paper introduces Agent Market Arena (AMA), a lifelong, real‑time benchmark that evaluates diverse LLM‑based trading agents across crypto and equity markets, revealing that agent architecture, rather than the underlying LLM, drives performance differences and risk‑adjusted returns.

BenchmarkFinancial TradingLLM Agents

0 likes · 11 min read

How Do LLM Trading Agents Perform in a Competitive Market Arena?

PaperAgent

Dec 9, 2025 · Artificial Intelligence

Agentic AI Unveiled: Dual Paradigms, Architecture Battles, and Future Directions

This comprehensive survey dissects Agentic AI by contrasting symbolic/classical and neural/generative paradigms, mapping 90 peer‑reviewed papers (2018‑2025) through a PRISMA workflow, evaluating architectures, collaboration models, benchmarks, and ethical considerations, and highlighting the emerging need for hybrid systems and standardized evaluation.

Hybrid ArchitectureLLM AgentsPRISMA review

0 likes · 8 min read

Agentic AI Unveiled: Dual Paradigms, Architecture Battles, and Future Directions

BirdNest Tech Talk

Dec 8, 2025 · Artificial Intelligence

How the New PEV Agent Pattern Boosts Reliable LLM Automation in Go

The article introduces the Plan‑Execute‑Verify (PEV) agent pattern added to langgraphgo, explains its three‑stage workflow, core features, configuration, concrete Go examples, implementation details, comparisons with ReAct and Reflection, and discusses best practices, limitations, and trade‑offs for high‑risk automation.

GoLLM AgentsLangGraphGo

0 likes · 9 min read

How the New PEV Agent Pattern Boosts Reliable LLM Automation in Go

PaperAgent

Dec 1, 2025 · Artificial Intelligence

How Deep Research Turns LLMs into Autonomous AI Scientists

This article surveys the emerging Deep Research (DR) paradigm that upgrades large language models into research agents capable of autonomous planning, multi‑source evidence gathering, memory management, and verifiable long‑form report generation, outlining its stages, core components, training pipeline, and evaluation benchmarks.

AI agentsAI research automationDeep Research

0 likes · 6 min read

How Deep Research Turns LLMs into Autonomous AI Scientists

AI Frontier Lectures

Nov 13, 2025 · Artificial Intelligence

How Graphs Empower LLM Agents: A Deep Dive into GLA

This article reviews the IEEE Intelligent Systems survey that introduces Graph‑augmented LLM Agents (GLA), explains how representing plans, memory, tools and multi‑agent interactions as graphs improves reliability, efficiency, interpretability and flexibility, and outlines five key research directions for future development.

Agent CoordinationLLM AgentsMultimodal AI

0 likes · 8 min read

How Graphs Empower LLM Agents: A Deep Dive into GLA

Network Intelligence Research Center (NIRC)

Nov 7, 2025 · Artificial Intelligence

Introducing LangGraph: A Low‑Level Framework for Building Stateful AI Agents

This article explains why modern LLM‑based applications need agent capabilities, introduces LangGraph’s core features such as stateful execution, graph‑based orchestration, tool integration, human‑in‑the‑loop and multi‑agent support, and provides a step‑by‑step Python example that builds a simple chat‑bot agent.

LLM AgentsLangGraphPython example

0 likes · 11 min read

Introducing LangGraph: A Low‑Level Framework for Building Stateful AI Agents

Bighead's Algorithm Notes

Oct 30, 2025 · Artificial Intelligence

FinSearchComp: ByteDance’s Expert‑Level Financial Search and Reasoning Benchmark for Real‑World Scenarios

FinSearchComp is the first fully open‑source benchmark that evaluates large‑language‑model agents' search and reasoning abilities in realistic financial workflows, featuring 635 expert‑annotated questions across three task types, built with 70 finance experts, and revealing that web‑enabled models with financial plugins markedly outperform API‑only models.

AI evaluationBenchmarkFinSearchComp

0 likes · 12 min read

FinSearchComp: ByteDance’s Expert‑Level Financial Search and Reasoning Benchmark for Real‑World Scenarios

DataFunTalk

Oct 22, 2025 · Artificial Intelligence

Introducing VitaBench: A Real-World Benchmark for Complex LLM Agents

VitaBench is a newly released, highly realistic benchmark that evaluates large‑language‑model agents across three everyday scenarios—food ordering, restaurant dining, and travel planning—by quantifying reasoning, tool‑use, and interaction complexities, revealing a significant performance gap in current models.

AI evaluationBenchmarkLLM Agents

0 likes · 13 min read

Introducing VitaBench: A Real-World Benchmark for Complex LLM Agents

Data Thinking Notes

Oct 9, 2025 · Artificial Intelligence

Mastering Context Engineering: Boost LLM Agent Performance

Context Engineering, the evolution beyond Prompt Engineering, optimizes the selection and management of tokens within large language model windows, enabling high‑performance, autonomous AI agents through efficient system prompts, tool design, example selection, dynamic retrieval, compression, structured memory, and multi‑agent architectures.

LLM AgentsMulti-Agent Systemsai-optimization

0 likes · 19 min read

Mastering Context Engineering: Boost LLM Agent Performance

xkx's Tech General Store

Sep 10, 2025 · Artificial Intelligence

Exploring WebDancer: Alibaba’s WebAgent that Solves Complex Queries Automatically

This article walks through installing Alibaba's WebDancer agent, explains its SFT‑plus‑RL training pipeline—including data construction, trajectory sampling, supervised fine‑tuning, and reinforcement learning—compares it with the earlier WebWalker, and demonstrates its multi‑step reasoning on a real‑world query.

AI AgentAlibabaLLM Agents

0 likes · 10 min read

Exploring WebDancer: Alibaba’s WebAgent that Solves Complex Queries Automatically

DataFunTalk

Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupLLM Agents

0 likes · 4 min read

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

DataFunSummit

Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupLLM Agents

0 likes · 4 min read

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

xkx's Tech General Store

Sep 4, 2025 · Artificial Intelligence

First Hands‑On Exploration of OpenManus: Installation, Architecture, and Real‑World Tests

This article walks through installing OpenManus, explains its ReAct‑based architecture, and demonstrates two practical test cases—retrieving GitHub statistics and generating an animated HTML physics lesson—while highlighting strengths and current limitations of the agent framework.

LLM AgentsOpenManusPlaywright

0 likes · 7 min read

First Hands‑On Exploration of OpenManus: Installation, Architecture, and Real‑World Tests

Smart Era Software Development

Jul 8, 2025 · Artificial Intelligence

12-Factor Agents – Core Principles to Bridge the Demo‑to‑Production Gap for Reliable LLM Apps

The article presents the 12‑Factor Agents framework, adapting the classic 12‑Factor App methodology to large‑language‑model agents and detailing twelve concrete engineering principles—ranging from prompt control and context engineering to human‑in‑the‑loop and stateless design—that together enable production‑grade, observable, and maintainable AI agents.

12-FactorContext ManagementLLM Agents

0 likes · 11 min read

12-Factor Agents – Core Principles to Bridge the Demo‑to‑Production Gap for Reliable LLM Apps

BirdNest Tech Talk

Jun 30, 2025 · Artificial Intelligence

Build a Weather‑Query ReAct Agent with LangGraph: Step‑by‑Step Guide

This article walks through constructing a stateful ReAct‑style LLM agent using LangGraph, detailing the core components—State, Nodes, Edges—defining a weather‑lookup tool with Open‑Meteo, configuring the graph’s nodes and conditional edges, and executing the workflow with streaming to observe each step in real time.

LLM AgentsLangGraphPython

0 likes · 16 min read

Build a Weather‑Query ReAct Agent with LangGraph: Step‑by‑Step Guide

AI Large Model Application Practice

Jun 23, 2025 · Databases

How Google’s MCP Toolbox Simplifies Enterprise Database Access for LLM Agents

This guide explains Google’s open‑source MCP Toolbox for Databases, covering its core concepts, installation, configuration, two usage modes (native SDK and MCP), example LangGraph agent integration, security features, observability, and practical code snippets for building reliable LLM‑driven database tools.

DatabasesLLM AgentsMCP Toolbox

0 likes · 11 min read

How Google’s MCP Toolbox Simplifies Enterprise Database Access for LLM Agents

Instant Consumer Technology Team

May 29, 2025 · Artificial Intelligence

API vs GUI Agents: How to Choose the Right LLM Automation Approach

This article examines the evolution of large language model agents, contrasting API‑based agents that use predefined function calls with GUI‑based agents that interact with visual interfaces, and explores hybrid strategies, orchestration tools, RAG techniques, and practical guidelines for selecting the optimal paradigm.

API vs GUIHybrid automationLLM Agents

0 likes · 34 min read

API vs GUI Agents: How to Choose the Right LLM Automation Approach