Tagged articles

agent engineering

36 articles · Page 1 of 1

Jul 17, 2026 · Artificial Intelligence

When Does Trajectory Review Boost Agent Success? Five Key Factors Explained

Recent research shows that letting agents review their own execution trajectories can improve task success rates, but only under clear conditions such as reliable external feedback, concrete planning outputs, appropriate timing, sufficient model capability, and manageable cost‑benefit trade‑offs.

LLM agentsagent engineeringdynamic replanning

0 likes · 31 min read

When Does Trajectory Review Boost Agent Success? Five Key Factors Explained

Data Party THU

Jul 13, 2026 · Artificial Intelligence

From QA to Task Completion: Survey of LLM Agent Systems and Harness Design

This survey argues that modern LLM agents should be viewed as a coupled system of a foundational model and an execution harness, analyzes the evolution from prompt engineering to harness engineering, defines six core harness responsibilities, examines task pressures, proposes richer evaluation metrics, and outlines future research directions.

BenchmarkingExecution HarnessLLM agents

0 likes · 16 min read

From QA to Task Completion: Survey of LLM Agent Systems and Harness Design

AI Large Model Application Practice

Jul 13, 2026 · Artificial Intelligence

20 Essential Agent Engineering Concepts for 2026: Making Agents Practical, Scalable, and Deployable

The article breaks down ten core engineering pillars for production‑grade AI agents—including tool calling with MCP, reusable skills, persistent memory, multi‑agent collaboration, workflow orchestration, hooks, observability, sandboxing, prompt‑injection defense, and the role of forward‑deployed engineers—to help turn demo agents into reliable, enterprise‑ready systems.

Memory SystemPrompt Injection Defenseagent engineering

0 likes · 21 min read

20 Essential Agent Engineering Concepts for 2026: Making Agents Practical, Scalable, and Deployable

DataFunSummit

Jul 7, 2026 · Artificial Intelligence

From Risk Control to Semantics: How Agents Self‑Evolve Without Degrading

In a July 2 live broadcast, three experts dissected the engineering of AI agents—covering architecture choices, the shift from heavyweight frameworks to modular skills, multi‑agent collaboration, evaluation beyond correctness, cost‑control strategies, and the crucial human‑in‑the‑loop responsibility—offering a pragmatic roadmap for stable, accountable agent deployment.

AI agentsBenchmarkingCost Optimization

0 likes · 17 min read

From Risk Control to Semantics: How Agents Self‑Evolve Without Degrading

DataFunSummit

Jul 6, 2026 · Artificial Intelligence

How Agents Evolve Without Degrading: From Risk Control to Semantic Engineering

A live discussion with experts from finance and data engineering explores how to build collaborative, cost‑effective, and responsibly governed AI agents, covering architecture choices, evaluation metrics, scaling challenges, and the balance between human oversight and autonomous decision‑making.

AI GovernanceCost OptimizationScalable AI

0 likes · 19 min read

How Agents Evolve Without Degrading: From Risk Control to Semantic Engineering

Architecture and Beyond

Jul 5, 2026 · Artificial Intelligence

Why Session‑First and Agent‑First Are Different, Not Better or Worse

The article analyses the Session‑First and Agent‑First paradigms for AI‑enabled software, outlining how Session‑First hides complexity in long chat sessions for rapid product launch, while Agent‑First restructures systems around autonomous agents with explicit APIs, documentation, security and observability, and advises when each approach is appropriate.

AI system designAgent FirstSession First

0 likes · 24 min read

Why Session‑First and Agent‑First Are Different, Not Better or Worse

DaTaobao Tech

Jul 1, 2026 · Artificial Intelligence

Designing AI Agent Skills: Behavior Programming, Token Economics, and Enforced Constraints

The article explains how to design AI Agent Skill systems as behavior‑programmed ability packages, covering structured YAML/Markdown definitions, token‑budget strategies, discovery mechanisms, constraint gates, TDD‑style testing, and iterative validation to achieve high compliance, low cost, and maintainable agent behavior.

AI AgentBehavior ProgrammingConstraint Mechanisms

0 likes · 18 min read

Designing AI Agent Skills: Behavior Programming, Token Economics, and Enforced Constraints

Architect

Jun 19, 2026 · Artificial Intelligence

From Harness to Environment: The Next Engineering Layer for LLM Agents

The article argues that while Harness engineering still controls how agents run, the emerging focus on Environment engineering determines whether agents receive reliable, verifiable feedback, shaping their long‑term learning and safety in real‑world tasks.

AI systemsEnvironment EngineeringHarness Engineering

0 likes · 21 min read

From Harness to Environment: The Next Engineering Layer for LLM Agents

ThinkingAgent

Jun 13, 2026 · Artificial Intelligence

Prompt Engineering Is Dead—Why Loop Engineering Is the New AI Work Unit

The article introduces Loop Engineering as the next paradigm in AI development, explaining how autonomous, self‑sustaining loops replace manual prompting, compares it with Prompt, Agent, and Harness engineering, outlines core loop structures, modes, goal design, and provides practical code‑first guidelines.

AI automationGoal DesignHarness Engineering

0 likes · 20 min read

Prompt Engineering Is Dead—Why Loop Engineering Is the New AI Work Unit

Design Hub

Jun 10, 2026 · Artificial Intelligence

Why Prompting Isn’t Enough: Designing Loops with Claude Fable 5

Lance Martin explains that the next stage of agent engineering shifts focus from clever prompts to designing self‑correction loops and cross‑session memory, using Claude Fable 5’s parameter‑golf experiment and continual‑learning benchmarks to show how robust loops turn powerful models into trustworthy work systems.

AIClaude Fable 5Memory

0 likes · 17 min read

Why Prompting Isn’t Enough: Designing Loops with Claude Fable 5

DataFunSummit

Jun 7, 2026 · Artificial Intelligence

Harness Engineering: Safety, Human‑Agent Collaboration, and Multi‑Agent Design

In a 90‑minute technical livestream, three experts dissect ten core challenges of bringing AI agents from demo to production, covering execution control, sandbox versus permission boundaries, checkpoint design, rollback strategies, tool‑call safety, human‑in‑the‑loop interaction, multi‑agent coordination, observability, and memory management.

CheckpointRollbackSafety Boundaries

0 likes · 17 min read

Harness Engineering: Safety, Human‑Agent Collaboration, and Multi‑Agent Design

DataFunSummit

Jun 5, 2026 · Artificial Intelligence

Harness Engineering: Making Multi‑Agent Systems Safe and Trustworthy from Demo to Production

In a 90‑minute live technical session, three experts dissect ten core challenges of Agent engineering—sandbox vs permission boundaries, checkpoints, rollback, tool‑call safety, human‑in‑the‑loop, multi‑agent coordination, observability, and memory—showing that moving agents from "usable" to "trustworthy" requires fine‑grained execution controls rather than broader permissions.

CheckpointRollbackSandbox

0 likes · 18 min read

Harness Engineering: Making Multi‑Agent Systems Safe and Trustworthy from Demo to Production

Top Architect

Jun 5, 2026 · Artificial Intelligence

Why Generic AI Agents Fail in Real Estate and How a Home‑grown Agent Solved It

The article explains that generic large‑language‑model agents such as Claude CoWork stumble on real‑estate tasks because of extremely long decision chains, non‑standard data formats, heavy reliance on personal expertise, and zero tolerance for errors, and shows how DeepLinkRE‑LLM built a vertical‑focused agent with proprietary data, a knowledge graph, expert‑validated skills, and end‑to‑end execution to deliver accurate, traceable reports and reshape enterprise organization.

AI agentsEnterprise AIKnowledge Graph

0 likes · 15 min read

Why Generic AI Agents Fail in Real Estate and How a Home‑grown Agent Solved It

DataFunTalk

Jun 4, 2026 · Artificial Intelligence

Harness Engineering: Execution Control, Safety Boundaries, Multi‑Agent Design

The live discussion explores how to move agents from demo to production by establishing execution controls, safety boundaries, checkpoints, rollback mechanisms, tool‑call auditing, human‑in‑the‑loop handling, multi‑agent coordination, observability, and memory management, forming a comprehensive harness engineering framework.

CheckpointPermission BoundaryRollback

0 likes · 15 min read

Harness Engineering: Execution Control, Safety Boundaries, Multi‑Agent Design

Architect

May 30, 2026 · Artificial Intelligence

Claude Code Self‑Repair Explained: Writing Error Feedback into the Harness

The article shows how to turn Claude Code’s occasional mistakes into a reliable feedback loop by using a CLAUDE.md entry file, Hooks, Permissions and Skills, so errors become visible, verifiable and can be written back into the harness for future runs.

AI agentsCLAUDE.mdClaude Code

0 likes · 22 min read

Claude Code Self‑Repair Explained: Writing Error Feedback into the Harness

DataFunTalk

May 29, 2026 · Artificial Intelligence

From Prompt to Context to Harness: Unpacking the Three Paradigm Shifts in Agent Engineering

The survey "Agent Harness Engineering: A Survey" reveals how agent systems have evolved from prompt engineering to context engineering and now to harness engineering, introduces the seven‑layer ETCLOVG framework, shows benchmark gains from better harnesses, and argues that observability, governance, and trace‑native evaluation are essential for production‑grade AI agents.

AI agentsEvaluationGovernance

0 likes · 14 min read

From Prompt to Context to Harness: Unpacking the Three Paradigm Shifts in Agent Engineering

Tech Minimalism

May 7, 2026 · Artificial Intelligence

12 Reusable MCP Design Patterns for Production‑Grade Anthropic Agents

The article distills Anthropic’s production‑agent guidance into five groups of twelve concrete MCP patterns—covering tool surface design, interaction semantics, authentication, context economy, and packaging—explaining why each pattern matters, its trade‑offs, and how it helps build safe, stable, low‑cost agent integrations.

AIAnthropicMCP

0 likes · 22 min read

12 Reusable MCP Design Patterns for Production‑Grade Anthropic Agents

Top Architecture Tech Stack

Apr 27, 2026 · Artificial Intelligence

DeepSeek V4 Pro vs GPT‑5.3 Codex High: Direct Code‑Generation Test Reveals the Gap

A two‑stage evaluation compares DeepSeek V4 Pro and GPT‑5.3 Codex High on a TypeScript LRU‑Cache task and a markdown‑inspection CLI project, showing DeepSeek leads on basic code correctness while GPT‑5.3 delivers a more complete engineering solution, with detailed scores and analysis.

DeepSeek V4 ProGPT-5.3 Codex HighLLM code evaluation

0 likes · 13 min read

DeepSeek V4 Pro vs GPT‑5.3 Codex High: Direct Code‑Generation Test Reveals the Gap

AI Architecture Hub

Apr 24, 2026 · Artificial Intelligence

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

Claude Code’s prompt‑caching delivers a 92% hit rate, slashing a 50‑round agent session cost from $6 to $1.15 by separating stable prefixes from dynamic tails, using a three‑layer cache architecture, exact token‑sequence matching, and three strict engineering rules that keep the cache hot and reliable.

Cache Hit RateClaude CodeLLM

0 likes · 13 min read

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

Architecture Musings

Apr 19, 2026 · Artificial Intelligence

My AI Adoption Journey: Lessons from the Terraform and Ghostty Creator

The author, Mitchell Hashimoto—co‑founder of HashiCorp and creator of Terraform and Ghostty—shares a step‑by‑step, candid account of adopting AI agents, detailing six phases from abandoning chatbots to continuously running agents, the concept of “harness engineering,” and practical insights on when and how to integrate AI into a developer workflow.

AI adoptionGhosttyHarness Engineering

0 likes · 16 min read

My AI Adoption Journey: Lessons from the Terraform and Ghostty Creator

Wu Shixiong's Large Model Academy

Apr 13, 2026 · Artificial Intelligence

Turning ReAct from Demo to Production: Handling Failures, Loops, and Token Budgets

This article explains how to upgrade a ReAct agent from a proof‑of‑concept to a production‑ready system by classifying tool failures, detecting repeated search loops, managing token budgets, and adding structured logging, complete with Python implementations and practical interview guidance.

LLMLoop DetectionTool Failure Handling

0 likes · 24 min read

Turning ReAct from Demo to Production: Handling Failures, Loops, and Token Budgets

MeowKitty Programming

Apr 11, 2026 · Industry Insights

Java’s New Frontier: Master AI Agents, Not Just Code, as Oracle, Spring, JetBrains Bet

The article explains how Oracle, Spring, and JetBrains are collectively pushing Java toward an agent‑centric ecosystem, shifting the developer’s role from writing code to orchestrating AI agents, and outlines the specific capabilities, engineering practices, and risks Java engineers must adopt to stay competitive in the coming years.

AI agentsJavaJetBrains

0 likes · 9 min read

Java’s New Frontier: Master AI Agents, Not Just Code, as Oracle, Spring, JetBrains Bet

Architect

Apr 9, 2026 · Industry Insights

Why Claude Managed Agents Are Redefining AI Workflows: A Deep Dive

Anthropic's Claude Managed Agents shift the focus from building demo loops to providing a fully hosted runtime base that handles sandboxing, state persistence, error recovery, and tool execution, enabling developers to concentrate on business logic and long‑running tasks while navigating new cost and compliance considerations.

AI Agent infrastructureClaude Managed AgentsEnterprise AI

0 likes · 23 min read

Why Claude Managed Agents Are Redefining AI Workflows: A Deep Dive

AI Tech Publishing

Apr 4, 2026 · Artificial Intelligence

Become a World-Class Agent Engineer: Master Context, Rules, and Termination Conditions

This guide explains how to become a world‑class Agent engineer by managing context bloat, defining clear rules and skills, separating research from implementation, using neutral prompts, and writing explicit termination contracts, while emphasizing that the final results remain the developer’s responsibility.

ClaudeCodex CLIContext Bloat

0 likes · 17 min read

Become a World-Class Agent Engineer: Master Context, Rules, and Termination Conditions

Smart Era Software Development

Apr 3, 2026 · Artificial Intelligence

Claude Code Deep Dive: Engineering an AI Programming Assistant and Agent Design Best Practices

This article provides a comprehensive technical analysis of Claude Code, explaining how it transforms AI programming assistants from simple code‑completion tools into autonomous agents that can read/write files, execute commands, manage context, and coordinate multiple agents, while detailing its eight core design principles, layered architecture, tool system, context engineering, state management, security model, extensibility mechanisms, and performance optimizations.

AI AgentClaude CodeTool System

0 likes · 26 min read

Claude Code Deep Dive: Engineering an AI Programming Assistant and Agent Design Best Practices

Radish, Keep Going!

Mar 31, 2026 · Artificial Intelligence

Why Agent‑First Systems Fail and How Harness Engineering Fixes Them

The article analyzes OpenAI’s Harness Engineering approach, explains four systemic failure modes of LLM‑driven agents, and details five modular components—readable environment, task state machine, verification loop, architectural constraints, and loop detection—that together enable reliable, large‑scale agent development.

AIHarnessLLM

0 likes · 17 min read

Why Agent‑First Systems Fail and How Harness Engineering Fixes Them

Yunqi AI+

Mar 27, 2026 · Artificial Intelligence

From AI Assistants to Production Agents: How Harness Becomes Core Infrastructure

The article explains how AI‑driven software is shifting from simple functional tools to result‑oriented autonomous systems, and argues that building production‑grade agents requires a dedicated engineering layer—called Harness—that provides task orchestration, state management, tool integration, observability, security, and governance.

AI agentsHarnessTask orchestration

0 likes · 21 min read

From AI Assistants to Production Agents: How Harness Becomes Core Infrastructure

Alibaba Cloud Native

Mar 26, 2026 · Artificial Intelligence

Why Harness Engineering Is the Next Frontier for AI Agents

The article examines the emerging paradigm of Harness Engineering, tracing its roots from the industrial and information revolutions to AI, and presents four real‑world case studies that demonstrate how prompt, context, and feedback engineering can dramatically improve large‑language‑model agents while highlighting open‑source tools for building scalable, collaborative AI systems.

AIHarness Engineeringagent engineering

0 likes · 17 min read

Why Harness Engineering Is the Next Frontier for AI Agents

Design Hub

Mar 26, 2026 · Artificial Intelligence

How Anthropic Advances Agent Development: From Code Writing to 4‑6 Hour Autonomy

Anthropic’s recent engineering paper shows that the next breakthrough in AI agents is not whether they can write code, but how to organize them into a planner‑generator‑evaluator harness that can work continuously for four to six hours, handle self‑evaluation, context anxiety, and deliver usable applications.

AI autonomyFull‑Stack AIagent engineering

0 likes · 16 min read

How Anthropic Advances Agent Development: From Code Writing to 4‑6 Hour Autonomy

AI Waka

Mar 25, 2026 · Artificial Intelligence

Why Persistent Specs Matter: Building Reliable AI Agents with an Artifact Layer

The article explains how an artifact layer—comprising specs, guidance files, skills, tests, and logs—preserves intent across AI agent sessions, enabling reliable, secure, and maintainable agent‑driven software development through spec‑first practices, bounded loops, and robust verification stacks.

AI agentsSpec Driven Developmentagent engineering

0 likes · 16 min read

Why Persistent Specs Matter: Building Reliable AI Agents with an Artifact Layer

Machine Learning Algorithms & Natural Language Processing

Feb 13, 2026 · Industry Insights

Meta and OpenAI Court OpenClaw: Zuckerberg Tests It, Ultraman Offers Compute Power

OpenClaw, the open‑source AI agent framework created by Peter Steinberger, has attracted acquisition overtures from Meta and OpenAI, amassed 189 k GitHub stars in under a month, and sparked discussions about its rapid prototype development, agent‑driven engineering, and the future of app‑less AI services.

AI agentsFuture of appsGitHub stars

0 likes · 10 min read

Meta and OpenAI Court OpenClaw: Zuckerberg Tests It, Ultraman Offers Compute Power

Architect

Feb 10, 2026 · Artificial Intelligence

Why Pi’s Minimalist Architecture Powers OpenClaw’s AI Coding Agent

The article explains how the ultra‑minimal Pi engine—built around just four tools, a tree‑shaped session model, and an extensible plug‑in system—provides a clean, auditable, and secure foundation for OpenClaw’s AI‑driven code‑writing capabilities, while highlighting practical extensions, engineering constraints, and trade‑offs.

AI coding agentExtensible architectureOpenClaw

0 likes · 16 min read

Why Pi’s Minimalist Architecture Powers OpenClaw’s AI Coding Agent

大转转FE

Jan 26, 2026 · Artificial Intelligence

Exploring AI Agent Development: Tools, Case Studies, and the Future of Engineering

This newsletter curates five in‑depth articles on AI agents, covering a week‑long Vibe Coding desktop assistant project, a deep dive into Claude Agent SDK tools, Huolala’s Agent Skills implementation, the shift to “Agent Engineer” roles, and the evolving opportunities for engineers in the AI era.

AI toolsClaude SDKSoftware Architecture

0 likes · 7 min read

Exploring AI Agent Development: Tools, Case Studies, and the Future of Engineering

Wu Shixiong's Large Model Academy

Nov 14, 2025 · Artificial Intelligence

How to Engineer Reliable Function Calls for LLM Agents: An End‑to‑End Framework

This article explains why function‑call accuracy is critical for LLM agents, identifies four common failure causes, and presents a systematic, five‑step engineering framework—including dynamic routing, chain‑of‑thought planning, result validation, memory injection, and log‑driven optimization—backed by concrete examples and quantitative improvements.

Function CallingLLMRAG

0 likes · 10 min read

How to Engineer Reliable Function Calls for LLM Agents: An End‑to‑End Framework

Architecture and Beyond

Nov 2, 2025 · Artificial Intelligence

Why AI Agents Still Fall Short: Key Challenges and Real-World Solutions

The article examines why current AI agents fall short of expectations, highlighting weak business understanding, limited execution, controllability issues, high customization costs, and the gap between model capabilities and engineering, while proposing SaaS firms' advantages, vertical scenario focus, security concerns, and future development trends.

AI agentsAI safetyEnterprise AI

0 likes · 11 min read

Why AI Agents Still Fall Short: Key Challenges and Real-World Solutions

Alibaba Cloud Native

Jun 12, 2025 · Artificial Intelligence

Why AI Agent Engineering Matters: From Product Design to Technical Architecture

This article breaks down AI agent engineering into product and technical engineering, explains how demand modeling, UI/UX design, prompt engineering, multi‑agent coordination, and observability combine to make AI agents usable, scalable, and trustworthy, and shows concrete examples and implementation patterns.

AIProduct Designagent engineering

0 likes · 23 min read

Why AI Agent Engineering Matters: From Product Design to Technical Architecture