Tagged articles
72 articles
Page 1 of 1
Data Party THU
Data Party THU
May 18, 2026 · Artificial Intelligence

How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking

VIGIL introduces a verify‑before‑commit framework that isolates tool‑stream injection attacks on LLM agents, using intent anchoring, perception sanitization, speculative reasoning, grounding verification, and validated trajectory memory, reducing attack success rates to 8‑12% while preserving task utility.

AI SafetyLLM agentsSIREN benchmark
0 likes · 11 min read
How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking
dbaplus Community
dbaplus Community
May 17, 2026 · Artificial Intelligence

Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving

The article dissects Claude Code’s LLM‑driven Grep search, showing how multi‑round tool calls replace static vector‑based RAG, presents ripgrep performance benchmarks, compares Claude Code with Cursor and Codex, and argues that zero‑index search is optimal for local code bases while larger projects still need indexing.

Claude CodeGrepLLM agents
0 likes · 36 min read
Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving
PaperAgent
PaperAgent
May 11, 2026 · Artificial Intelligence

SkillOS: How Skill Governance Powers Self‑Evolving AI Agents

SkillOS addresses the one‑off nature of current LLM agents by introducing a closed‑loop system where a trainable Skill Curator continuously extracts, updates, and manages reusable skills from execution traces, leading to measurable gains in success rates, efficiency, and cross‑task generalization.

Grouped Task StreamsLLM agentsMeta-Strategy Skills
0 likes · 10 min read
SkillOS: How Skill Governance Powers Self‑Evolving AI Agents
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 8, 2026 · Artificial Intelligence

T²PO: Uncertainty‑Guided Exploration Control for Stable Multi‑Turn Agent RL

The paper identifies inefficient exploration, termed "hesitation," as the root cause of instability in multi‑turn reinforcement learning for LLM agents and introduces T²PO, an uncertainty‑driven token‑ and turn‑level control framework that markedly improves training stability and performance on benchmarks such as WebShop, ALFWorld, and Search QA.

LLM agentsT2POexploration control
0 likes · 16 min read
T²PO: Uncertainty‑Guided Exploration Control for Stable Multi‑Turn Agent RL
PaperAgent
PaperAgent
May 4, 2026 · Artificial Intelligence

A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution

This survey systematically reviews self‑evolving agents, explains why autonomous agents are needed, proposes a unified taxonomy of three evolution paradigms, analyzes model‑centric, environment‑centric, and co‑evolution approaches, and outlines future challenges in designing adaptive environments.

AI Agent TaxonomyCo-EvolutionEnvironment-Centric Evolution
0 likes · 14 min read
A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution
AI Tech Publishing
AI Tech Publishing
May 1, 2026 · Artificial Intelligence

5 Counterintuitive Design Principles for Prompt Caching in Claude Code

The article details five counterintuitive design principles for Claude Code's prompt caching—optimizing prompt layout, using message‑based updates, never switching models or tools mid‑conversation, safely compressing context, and monitoring cache health—backed by concrete examples and up to 90% cost savings.

AI EngineeringClaude CodeLLM agents
0 likes · 10 min read
5 Counterintuitive Design Principles for Prompt Caching in Claude Code
AI Explorer
AI Explorer
Apr 30, 2026 · Industry Insights

AI Tech Daily: Key AI Industry Highlights for April 30 2026

The AI Tech Daily roundup highlights Microsoft's 123% AI revenue surge, groundbreaking GPT‑5.5 restrictions, DeepSeek's multimodal launch, Ant Group's zkDTVM benchmark record, a 23‑year‑old Linux kernel bug, Stripe's 288 AI‑focused features, and emerging trends in LLM agent orchestration and AI adoption metrics.

AI revenueDeepSeekGPT-5.5
0 likes · 4 min read
AI Tech Daily: Key AI Industry Highlights for April 30 2026
SuanNi
SuanNi
Apr 27, 2026 · Artificial Intelligence

How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy

The paper shows that conventional LLM agents fail on real‑world enterprise data because of chaotic data sources, while the RUBICON architecture uses a minimal Agentic Query Language to let users direct data retrieval, achieving 100% accuracy with a much cheaper model and dramatically lower token and monetary costs.

Agentic Query LanguageBenchmarkData Integration
0 likes · 11 min read
How MIT’s RUBICON Cuts AI Agent Costs by 90% While Achieving 100% Accuracy
AI Architecture Hub
AI Architecture Hub
Apr 23, 2026 · Artificial Intelligence

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Prompt caching, a prefix‑matching technique that reuses prior LLM interactions, proved essential for Claude Code’s low latency and cost, and the article details counter‑intuitive practices such as arranging static prompts first, updating info via messages, avoiding mid‑session model or tool changes, and ensuring cache‑safe context forks.

AI EngineeringClaude CodeLLM agents
0 likes · 10 min read
Why Prompt Caching Is Critical: Lessons from Building Claude Code
AI Waka
AI Waka
Apr 22, 2026 · Artificial Intelligence

Hybrid MCP‑Skill Model: Keeping LLM Agent Skills Fresh

The article analyzes the trade‑offs between packaging new agent functionality as a static Skill versus a dynamic MCP server, proposes a hybrid thin‑CLI approach that combines the ease of Skills with the up‑to‑date guarantees of MCP, and illustrates the design with concrete code examples.

API VersioningCLI wrapperHybrid Architecture
0 likes · 7 min read
Hybrid MCP‑Skill Model: Keeping LLM Agent Skills Fresh
PaperAgent
PaperAgent
Apr 22, 2026 · Artificial Intelligence

How SkillClaw Enables Collective Evolution of Agent Skills in Real-World Use

SkillClaw introduces a centralized evolution framework that transforms user interactions into structured evidence, allowing LLM agents to refine, create, or skip skills based on aggregated success and failure patterns, with nightly validation ensuring only proven improvements are deployed, resulting in consistent performance gains across diverse tasks.

AI workflowBenchmarkLLM agents
0 likes · 13 min read
How SkillClaw Enables Collective Evolution of Agent Skills in Real-World Use
AntTech
AntTech
Apr 22, 2026 · Artificial Intelligence

How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents

This article reviews two recent ICLR 2026 papers—M²‑Miner, a multi‑agent Monte‑Carlo Tree Search framework for low‑cost mobile GUI data mining, and IGPO, an information‑gain‑based reinforcement‑learning method that provides dense rewards for multi‑turn search agents—detailing their designs, experiments, and open‑source releases.

GUI Data MiningInformation GainLLM agents
0 likes · 8 min read
How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents
Machine Heart
Machine Heart
Apr 21, 2026 · Artificial Intelligence

How Externalization Drives the Evolution of LLM Agents – Insights from a 54‑Page SJTU Review

A recent 54‑page arXiv review by Shanghai Jiao Tong University and collaborators argues that the reliability gains of LLM agents stem more from externalizing memory, skills, protocols, and harness infrastructure than from scaling the underlying model, outlining three structural mismatches and a unified externalization framework.

ExternalizationHarnessLLM agents
0 likes · 13 min read
How Externalization Drives the Evolution of LLM Agents – Insights from a 54‑Page SJTU Review
SuanNi
SuanNi
Apr 19, 2026 · Artificial Intelligence

Why External Cognition Is the New Engine Behind Reliable LLM Agents

The article analyzes how the success of large‑language‑model agents now hinges on external cognitive infrastructure—memory, skills, protocols, and a central Harness—rather than raw model parameters, outlining architectural evolution, practical challenges, and emerging industry trends.

AI industry trendsHarness frameworkLLM agents
0 likes · 15 min read
Why External Cognition Is the New Engine Behind Reliable LLM Agents
AI Architecture Hub
AI Architecture Hub
Apr 18, 2026 · Artificial Intelligence

Build a Dual‑Layer AI Knowledge Base in 20 Minutes and Supercharge Your LLM Agents

This article explains how to create a two‑layer AI knowledge system— a dynamic Knowledge Base Layer and a static Brand Foundation Layer— in about 20 minutes, detailing its architecture, advantages over traditional RAG, step‑by‑step deployment, and real‑world use cases for creators, teams, and personal productivity.

AI knowledge baseGitLLM agents
0 likes · 16 min read
Build a Dual‑Layer AI Knowledge Base in 20 Minutes and Supercharge Your LLM Agents
AI Waka
AI Waka
Apr 17, 2026 · Artificial Intelligence

From Generative to Agentic AI: Building Real‑World Agent Systems

The article explains how AI is shifting from reactive generative models to goal‑driven Agentic systems, outlines core framework components, common patterns, skill abstractions, a step‑by‑step implementation guide for backend engineers, and introduces Harness Engineering for production‑grade reliability and observability.

AI frameworksAgentic AILLM agents
0 likes · 10 min read
From Generative to Agentic AI: Building Real‑World Agent Systems
Baidu Geek Talk
Baidu Geek Talk
Apr 15, 2026 · Artificial Intelligence

Unveiling Claude Code: How Rules, MCP, and Skills Power the Coding Agent

This article dissects the leaked Claude Code v2.1.88 source to reveal how the three core concepts—Rules, MCP, and Skills—are implemented, where they are injected in the Anthropic LLM API request, and when developers should prefer each mechanism for reliable, secure, and token‑efficient coding agent workflows.

Claude CodeLLM agentsMCP
0 likes · 25 min read
Unveiling Claude Code: How Rules, MCP, and Skills Power the Coding Agent
AI Engineer Programming
AI Engineer Programming
Apr 15, 2026 · Artificial Intelligence

Elephant Alpha: Free 100B‑Parameter Instant Model with 256K Context on OpenRouter

OpenRouter quietly launched Elephant Alpha, a free 100B‑parameter LLM with a 256K token window, positioned as an "instant model" that prioritises token efficiency and speed, supports function calling and prompt caching, and is compared against other Animal‑series models while community speculation surrounds its origin.

256K contextElephant AlphaFunction Calling
0 likes · 6 min read
Elephant Alpha: Free 100B‑Parameter Instant Model with 256K Context on OpenRouter
AI Tech Publishing
AI Tech Publishing
Apr 12, 2026 · Artificial Intelligence

How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store

The article dissects Hermes Agent’s four‑store memory architecture—declarative, procedural, situational, and persona—deterministic routing, frozen snapshots, nudge‑driven persistence, security scanning, dual‑peer modeling, skill management, and three‑phase context compression, showing why it outperforms OpenClaw’s breadth‑first design.

Hermes AgentLLM agentsMemory Architecture
0 likes · 17 min read
How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 10, 2026 · Artificial Intelligence

One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing

PaperOrchestra, Google’s multi‑agent framework, turns raw experiment logs, brief ideas, LaTeX templates and conference guidelines into fully formatted CVPR/ICLR papers, using five coordinated agents, Semantic Scholar verification, PaperBanana figure generation, and a refinement loop that boosts simulated acceptance rates by up to 22% while running in under 40 minutes.

BenchmarkLLM agentsPaperBanana
0 likes · 9 min read
One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing
AI Engineering
AI Engineering
Apr 10, 2026 · Artificial Intelligence

Getting Started with Hermes Agent: A Complete Beginner’s Guide

Hermes Agent, the open‑source LLM‑driven framework from Nous Research, has attracted 43.7K GitHub stars, but its documentation leaves many developers stranded; a community‑curated ecosystem map and the “Orange Book” guide now provide step‑by‑step installation, skill development, multi‑agent orchestration, and deployment resources to bridge the gap.

Agent orchestrationDocumentation guideEcosystem map
0 likes · 5 min read
Getting Started with Hermes Agent: A Complete Beginner’s Guide
AI Step-by-Step
AI Step-by-Step
Apr 8, 2026 · Operations

How to Light Up the Black Box of LLM Agents with Full‑Stack Observability

The article explains why traditional logs are insufficient for LLM agents, outlines five observability dimensions—tracing, metrics, behavioral governance, state & memory, and evaluation—and provides concrete, open‑source‑based steps to instrument, monitor, and act on agent workloads in production.

Behavioral GovernanceLLM agentsMetrics
0 likes · 11 min read
How to Light Up the Black Box of LLM Agents with Full‑Stack Observability
AgentGuide
AgentGuide
Apr 2, 2026 · Artificial Intelligence

Understanding ReAct: The Reason‑Act Loop Behind LLM Agents

The article explains ReAct—a Reason‑Act framework for large language model agents that observes, reasons, takes actions via tools, receives feedback, and iterates—highlighting its distinction from plain QA, its step‑by‑step workflow, practical importance, and a weather‑query example.

AI workflowLLM agentsReact
0 likes · 5 min read
Understanding ReAct: The Reason‑Act Loop Behind LLM Agents
AI Step-by-Step
AI Step-by-Step
Mar 30, 2026 · Artificial Intelligence

How to Keep LLM Agents in Check with Guardrails

The article explains why LLM agents can over‑promise or execute unauthorized actions, and outlines a three‑layer guardrail system—prompt review, output validation, and tool‑action interception—plus concrete rules, examples, and test cases to ensure safe deployment.

AI SafetyLLM agentsPrompt engineering
0 likes · 11 min read
How to Keep LLM Agents in Check with Guardrails
DevOps Coach
DevOps Coach
Mar 27, 2026 · Operations

Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?

An experiment with four LLM‑driven autonomous agents—Architect, Builder, Security Sentinel, and QA Tester—attempted to provision a Proxmox‑based HA Kubernetes cluster using real hardware, revealing costly context drift, emergent coordination failures, and stark differences between Gemini and Claude in diagnosing infrastructure‑as‑code errors.

AI OpsAnsibleAutonomous SRE
0 likes · 14 min read
Can Four LLM‑Powered Agents Build a Real Kubernetes Cluster Without Human Help?
Frontend AI Walk
Frontend AI Walk
Mar 25, 2026 · Artificial Intelligence

Slow Learning Agents: 7 Cognitive Shifts from Using ChatGPT to Truly Understanding Agents

The article outlines seven essential mindset transitions for building robust LLM agents—recognizing agents as autonomous decision loops, prioritizing harness over model size, layering context, designing tools for agent goals, structuring multi‑layer memory, coordinating multiple agents with isolation and protocols, and aligning evaluation with the real environment.

Context managementHarnessLLM agents
0 likes · 16 min read
Slow Learning Agents: 7 Cognitive Shifts from Using ChatGPT to Truly Understanding Agents
AI Architecture Hub
AI Architecture Hub
Mar 25, 2026 · Artificial Intelligence

How Memento-Skills Enables Continuous Learning for Frozen LLM Agents

The article analyzes the limitations of frozen LLM agents—fixed parameters, loss of state, and costly fine‑tuning—and introduces the Memento‑Skills framework, which adds an external, evolvable skill memory to achieve deployment‑time learning, detailed architecture, optimization knobs, and strong experimental gains.

AI researchDeployment-Time LearningLLM agents
0 likes · 14 min read
How Memento-Skills Enables Continuous Learning for Frozen LLM Agents
Tencent Cloud Developer
Tencent Cloud Developer
Mar 24, 2026 · Artificial Intelligence

Why AI Coding Agents Miss the Mark—and How to Make Them Work

The article analyzes the hype around AI coding tools like OpenClaw, exposing false demands, the pitfalls of building agents before real needs, the quality gaps in AI‑generated code, and practical strategies such as spec‑first coding, bottleneck identification, and multi‑model orchestration to improve productivity.

AI CodingLLM agentsSpec Coding
0 likes · 15 min read
Why AI Coding Agents Miss the Mark—and How to Make Them Work
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 19, 2026 · Artificial Intelligence

From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning

The article analyzes the RETROAGENT framework, showing how its dual intrinsic feedback and memory‑buffer mechanisms enable LLM agents to move beyond solving tasks toward continual evolution, and presents benchmark results that demonstrate significant performance gains and strong test‑time adaptation across four challenging environments.

LLM agentsRETROAGENTdual intrinsic feedback
0 likes · 7 min read
From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning
DeepHub IMBA
DeepHub IMBA
Mar 14, 2026 · Artificial Intelligence

Three Proven Multi‑Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm

The article explains why single LLM agents often fail due to context overload, role confusion, and fault propagation, then details three reliable orchestration patterns—Supervisor, Pipeline, and Swarm—along with concrete code examples, communication schemas, error‑handling layers, cost and latency considerations, and best‑practice recommendations for production deployment.

Cost OptimizationDistributed TracingLLM agents
0 likes · 15 min read
Three Proven Multi‑Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm
Architect
Architect
Mar 11, 2026 · Artificial Intelligence

How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery

This article explains OpenClaw's sophisticated context‑management system, detailing its three‑layer approach to pruning old turns, trimming tool results, and handling oversized outputs, while preserving critical state through memory flushing, structured compaction, and a robust overflow‑recovery pipeline.

LLM agentscompressionmemory persistence
0 likes · 29 min read
How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery
Woodpecker Software Testing
Woodpecker Software Testing
Mar 5, 2026 · Artificial Intelligence

AI Agent Testing: An In-Depth Guide Every Test Expert Needs

The article explains why traditional assertion‑based testing fails for LLM‑driven AI agents and introduces a four‑dimensional GBRT framework—Goal, Behavior, Resilience, Traceability—detailing concrete examples, evaluation methods, toolchain integration, and practical steps to build measurable, robust test pipelines for autonomous agents.

AI testingGBRTLLM agents
0 likes · 9 min read
AI Agent Testing: An In-Depth Guide Every Test Expert Needs
PaperAgent
PaperAgent
Mar 2, 2026 · Artificial Intelligence

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.

LLM agentsSKILLRLhierarchical skill bank
0 likes · 10 min read
SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution
AI Tech Publishing
AI Tech Publishing
Mar 2, 2026 · Artificial Intelligence

Why pi-mono’s Agent Design Is an Anti‑Pattern (and What Works Better)

The author explains why Claude Code became too bloated, outlines the minimal, controllable requirements for a code‑assistant, details pi-mono’s four‑package architecture, shares design anti‑patterns, and presents benchmark results showing its simple approach outperforms heavier agents.

Agent DesignBenchmarkClaude Opus
0 likes · 13 min read
Why pi-mono’s Agent Design Is an Anti‑Pattern (and What Works Better)
AI Waka
AI Waka
Feb 27, 2026 · Artificial Intelligence

How to Add Persistent Long‑Term Memory to LangGraph Agents with Trustcall

This article explains how to integrate durable long‑term memory into LangGraph agents, covering memory types, their coordination, limitations of native LangGraph storage, and a step‑by‑step implementation using Trustcall’s schema‑driven extractors for both user profiles and paper collections.

AILLM agentsLangGraph
0 likes · 16 min read
How to Add Persistent Long‑Term Memory to LangGraph Agents with Trustcall
Architect
Architect
Feb 13, 2026 · Artificial Intelligence

Cutting Agent Costs: Practical Tips from the ‘Toward Efficient Agents’ Survey

The article analyzes why autonomous LLM agents become expensive, breaks down their cost components, and presents concrete engineering strategies—memory management, tool‑call optimization, and planning constraints—to dramatically reduce token usage and improve reliability while maintaining performance.

Cost OptimizationLLM agentsPlanning
0 likes · 19 min read
Cutting Agent Costs: Practical Tips from the ‘Toward Efficient Agents’ Survey
PaperAgent
PaperAgent
Jan 28, 2026 · Artificial Intelligence

How Clawdbot Achieves Persistent, Local Memory for LLM Agents

Clawdbot implements a fully local, persistent memory system for LLM agents by storing context and long‑term knowledge in editable Markdown files, indexing them with SQLite‑vec and FTS5, supporting multi‑agent isolation, compression, pruning, and configurable session lifecycles to maintain efficient, cost‑effective interactions.

LLM agentscontext compressionlocal storage
0 likes · 13 min read
How Clawdbot Achieves Persistent, Local Memory for LLM Agents
High Availability Architecture
High Availability Architecture
Jan 27, 2026 · Artificial Intelligence

How LLM Agents Are Redefining Programming: From Manual Coding to Autonomous Agents

The author reflects on a rapid shift in software development workflows driven by LLM agents, highlighting the move from manual coding to agent‑driven automation, the remaining need for IDE oversight, the strengths of tenacity and leverage, and the broader implications for engineers' future roles.

AI programmingAutomationLLM agents
0 likes · 7 min read
How LLM Agents Are Redefining Programming: From Manual Coding to Autonomous Agents
Architecture and Beyond
Architecture and Beyond
Jan 17, 2026 · Artificial Intelligence

Progressive Disclosure & Dynamic Context: Making LLM Agents Reliable Execution Systems

This article explains how progressive disclosure and dynamic context management address the three core bottlenecks of complex LLM agents—context explosion, tool overload, and uncontrolled execution—by structuring context, tools, and SOPs into layered, token‑efficient, and verifiable workflows.

AI EngineeringLLM agentsProgressive Disclosure
0 likes · 15 min read
Progressive Disclosure & Dynamic Context: Making LLM Agents Reliable Execution Systems
Tencent Cloud Developer
Tencent Cloud Developer
Dec 23, 2025 · Artificial Intelligence

How ReAct (Reasoning + Acting) Empowers LLM Agents to Solve Real‑World Tasks

This article explains the ReAct paradigm—combining reasoning, action, and observation—to turn large language models into controllable agents, detailing its core concepts, architecture, workflow, code implementation, application scenarios, advantages over other methods, and future research directions.

AI automationLLM agentsreasoning and acting
0 likes · 29 min read
How ReAct (Reasoning + Acting) Empowers LLM Agents to Solve Real‑World Tasks
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Dec 9, 2025 · Artificial Intelligence

How Do LLM Trading Agents Perform in a Competitive Market Arena?

The paper introduces Agent Market Arena (AMA), a lifelong, real‑time benchmark that evaluates diverse LLM‑based trading agents across crypto and equity markets, revealing that agent architecture, rather than the underlying LLM, drives performance differences and risk‑adjusted returns.

Agent ArchitectureBenchmarkFinancial Trading
0 likes · 11 min read
How Do LLM Trading Agents Perform in a Competitive Market Arena?
PaperAgent
PaperAgent
Dec 9, 2025 · Artificial Intelligence

Agentic AI Unveiled: Dual Paradigms, Architecture Battles, and Future Directions

This comprehensive survey dissects Agentic AI by contrasting symbolic/classical and neural/generative paradigms, mapping 90 peer‑reviewed papers (2018‑2025) through a PRISMA workflow, evaluating architectures, collaboration models, benchmarks, and ethical considerations, and highlighting the emerging need for hybrid systems and standardized evaluation.

Agentic AIHybrid ArchitectureLLM agents
0 likes · 8 min read
Agentic AI Unveiled: Dual Paradigms, Architecture Battles, and Future Directions
BirdNest Tech Talk
BirdNest Tech Talk
Dec 8, 2025 · Artificial Intelligence

How the New PEV Agent Pattern Boosts Reliable LLM Automation in Go

The article introduces the Plan‑Execute‑Verify (PEV) agent pattern added to langgraphgo, explains its three‑stage workflow, core features, configuration, concrete Go examples, implementation details, comparisons with ReAct and Reflection, and discusses best practices, limitations, and trade‑offs for high‑risk automation.

GoLLM agentsLangGraphGo
0 likes · 9 min read
How the New PEV Agent Pattern Boosts Reliable LLM Automation in Go
PaperAgent
PaperAgent
Dec 1, 2025 · Artificial Intelligence

How Deep Research Turns LLMs into Autonomous AI Scientists

This article surveys the emerging Deep Research (DR) paradigm that upgrades large language models into research agents capable of autonomous planning, multi‑source evidence gathering, memory management, and verifiable long‑form report generation, outlining its stages, core components, training pipeline, and evaluation benchmarks.

AI agentsAI research automationDeep Research
0 likes · 6 min read
How Deep Research Turns LLMs into Autonomous AI Scientists
AI Frontier Lectures
AI Frontier Lectures
Nov 13, 2025 · Artificial Intelligence

How Graphs Empower LLM Agents: A Deep Dive into GLA

This article reviews the IEEE Intelligent Systems survey that introduces Graph‑augmented LLM Agents (GLA), explains how representing plans, memory, tools and multi‑agent interactions as graphs improves reliability, efficiency, interpretability and flexibility, and outlines five key research directions for future development.

Agent CoordinationKnowledge GraphsLLM agents
0 likes · 8 min read
How Graphs Empower LLM Agents: A Deep Dive into GLA
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Nov 7, 2025 · Artificial Intelligence

Introducing LangGraph: A Low‑Level Framework for Building Stateful AI Agents

This article explains why modern LLM‑based applications need agent capabilities, introduces LangGraph’s core features such as stateful execution, graph‑based orchestration, tool integration, human‑in‑the‑loop and multi‑agent support, and provides a step‑by‑step Python example that builds a simple chat‑bot agent.

Human-in-the-LoopLLM agentsLangGraph
0 likes · 11 min read
Introducing LangGraph: A Low‑Level Framework for Building Stateful AI Agents
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 30, 2025 · Artificial Intelligence

FinSearchComp: ByteDance’s Expert‑Level Financial Search and Reasoning Benchmark for Real‑World Scenarios

FinSearchComp is the first fully open‑source benchmark that evaluates large‑language‑model agents' search and reasoning abilities in realistic financial workflows, featuring 635 expert‑annotated questions across three task types, built with 70 finance experts, and revealing that web‑enabled models with financial plugins markedly outperform API‑only models.

AI EvaluationBenchmarkFinSearchComp
0 likes · 12 min read
FinSearchComp: ByteDance’s Expert‑Level Financial Search and Reasoning Benchmark for Real‑World Scenarios
DataFunTalk
DataFunTalk
Oct 22, 2025 · Artificial Intelligence

Introducing VitaBench: A Real-World Benchmark for Complex LLM Agents

VitaBench is a newly released, highly realistic benchmark that evaluates large‑language‑model agents across three everyday scenarios—food ordering, restaurant dining, and travel planning—by quantifying reasoning, tool‑use, and interaction complexities, revealing a significant performance gap in current models.

AI EvaluationBenchmarkLLM agents
0 likes · 13 min read
Introducing VitaBench: A Real-World Benchmark for Complex LLM Agents
Data Thinking Notes
Data Thinking Notes
Oct 9, 2025 · Artificial Intelligence

Mastering Context Engineering: Boost LLM Agent Performance

Context Engineering, the evolution beyond Prompt Engineering, optimizes the selection and management of tokens within large language model windows, enabling high‑performance, autonomous AI agents through efficient system prompts, tool design, example selection, dynamic retrieval, compression, structured memory, and multi‑agent architectures.

AI OptimizationContext EngineeringLLM agents
0 likes · 19 min read
Mastering Context Engineering: Boost LLM Agent Performance
DataFunTalk
DataFunTalk
Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents
DataFunSummit
DataFunSummit
Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray
BirdNest Tech Talk
BirdNest Tech Talk
Jun 30, 2025 · Artificial Intelligence

Build a Weather‑Query ReAct Agent with LangGraph: Step‑by‑Step Guide

This article walks through constructing a stateful ReAct‑style LLM agent using LangGraph, detailing the core components—State, Nodes, Edges—defining a weather‑lookup tool with Open‑Meteo, configuring the graph’s nodes and conditional edges, and executing the workflow with streaming to observe each step in real time.

LLM agentsLangGraphPython
0 likes · 16 min read
Build a Weather‑Query ReAct Agent with LangGraph: Step‑by‑Step Guide
AI Large Model Application Practice
AI Large Model Application Practice
Jun 23, 2025 · Databases

How Google’s MCP Toolbox Simplifies Enterprise Database Access for LLM Agents

This guide explains Google’s open‑source MCP Toolbox for Databases, covering its core concepts, installation, configuration, two usage modes (native SDK and MCP), example LangGraph agent integration, security features, observability, and practical code snippets for building reliable LLM‑driven database tools.

LLM agentsMCP ToolboxObservability
0 likes · 11 min read
How Google’s MCP Toolbox Simplifies Enterprise Database Access for LLM Agents
Instant Consumer Technology Team
Instant Consumer Technology Team
May 29, 2025 · Artificial Intelligence

API vs GUI Agents: How to Choose the Right LLM Automation Approach

This article examines the evolution of large language model agents, contrasting API‑based agents that use predefined function calls with GUI‑based agents that interact with visual interfaces, and explores hybrid strategies, orchestration tools, RAG techniques, and practical guidelines for selecting the optimal paradigm.

API vs GUIHybrid automationLLM agents
0 likes · 34 min read
API vs GUI Agents: How to Choose the Right LLM Automation Approach
Fighter's World
Fighter's World
Apr 12, 2025 · Artificial Intelligence

Google’s A2A Protocol: A New Era of Agent Interoperability

The article analyzes Google’s Agent‑to‑Agent (A2A) protocol, explaining how it addresses the fragmentation of LLM‑driven agents, outlines its architecture, design principles, core components, and compares it with Anthropic’s MCP, while discussing strategic implications and remaining challenges for large‑scale multi‑agent ecosystems.

Agent interoperabilityAgent marketplaceEnterprise AI
0 likes · 27 min read
Google’s A2A Protocol: A New Era of Agent Interoperability
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Mar 21, 2025 · Artificial Intelligence

Comparing Four Leading Open‑Source LLM Agent Frameworks: Autogen, CrewAI, LangGraph, and Swarm

This article provides a detailed comparison of four prominent open‑source LLM agent frameworks—Autogen, CrewAI, LangGraph, and Swarm—covering their core concepts, strengths, weaknesses, ideal use cases, and how they differ in scalability, memory handling, tool integration, and community support.

AutoGenCrewAIEnterprise AI
0 likes · 14 min read
Comparing Four Leading Open‑Source LLM Agent Frameworks: Autogen, CrewAI, LangGraph, and Swarm
Fighter's World
Fighter's World
Mar 8, 2025 · Artificial Intelligence

Why MCP Is Essential for Building LLM Agents – Anthropic’s Protocol Explained

The Model Context Protocol (MCP) introduced by Anthropic provides a standardized, TCP/IP‑like communication layer that unifies resources, tools, and prompts, enabling seamless integration of large language model agents with external systems, reducing fragmentation, and accelerating AI agent development.

AI InteroperabilityAgent ArchitectureAnthropic
0 likes · 16 min read
Why MCP Is Essential for Building LLM Agents – Anthropic’s Protocol Explained
Infra Learning Club
Infra Learning Club
Feb 7, 2025 · Artificial Intelligence

Understanding LLM Agents: Architecture, Capabilities, and Key Challenges

This article explains what LLM agents are, their core components—brain, memory, planning, and tool use—illustrates how they handle complex queries through task decomposition, surveys notable frameworks, and discusses key challenges such as limited context, long‑term planning difficulties, output inconsistency, and prompt dependence.

AI ArchitectureLLM agentsMemory
0 likes · 15 min read
Understanding LLM Agents: Architecture, Capabilities, and Key Challenges
DataFunSummit
DataFunSummit
Jun 6, 2024 · Artificial Intelligence

MetaGPT: Multi‑Agent Collaboration and Agent Capability Enhancement

This article introduces MetaGPT, an open‑source multi‑agent framework that leverages large language models to automate software development, data science, and simulation tasks, detailing its development, impact, experimental results, memory and reasoning enhancements, and comparisons with related systems.

AI researchAgent MemoryLLM agents
0 likes · 21 min read
MetaGPT: Multi‑Agent Collaboration and Agent Capability Enhancement
NewBeeNLP
NewBeeNLP
Apr 15, 2024 · Artificial Intelligence

Unlocking LLM‑Based Agents: Architecture, Challenges, and Future Directions

This article systematically outlines the architecture of large‑language‑model (LLM) agents, examines their key technical challenges such as role‑playing, memory design, reasoning and multi‑agent collaboration, and explores emerging research directions and practical case studies.

AIFuture DirectionsLLM agents
0 likes · 11 min read
Unlocking LLM‑Based Agents: Architecture, Challenges, and Future Directions
DataFunSummit
DataFunSummit
Sep 30, 2023 · Artificial Intelligence

Causal Inference from the Perspective of Large Models

This presentation by senior AI architect He Gang explores how large language models and LLM‑powered agents can enhance causal inference tasks, detailing model‑assisted analysis, agent‑based inference methods, and multi‑agent simulations to advance causal research.

AILLM agentslarge language models
0 likes · 2 min read
Causal Inference from the Perspective of Large Models
Tencent Cloud Developer
Tencent Cloud Developer
Apr 17, 2023 · Artificial Intelligence

AutoGPT: An Overview of Autonomous AI Agents

AutoGPT is an open‑source autonomous AI agent that uses GPT‑4/3.5 APIs to decompose user‑defined goals into sub‑tasks, iteratively execute them, store results in memory, and autonomously build complex outputs such as code, websites, research, or financial plans, though it can incur high token costs and limited transparency.

AI automationAutoGPTAutonomous AI
0 likes · 8 min read
AutoGPT: An Overview of Autonomous AI Agents