Tagged articles

long-running tasks

13 articles · Page 1 of 1

Jul 12, 2026 · Artificial Intelligence

10 Real GPT-5.6 Cases: From Voxel Manhattan to Google Earth Clone

The article presents ten publicly sourced GPT-5.6 demonstrations that reveal four emerging capabilities—long‑running autonomous workflows, complex tool orchestration, code‑to‑experience pipelines, and cheaper frontier effects—while analyzing token costs, comparative strengths, and the model’s shift from answering to completing work.

AI agentsGPT-5.6Tool Orchestration

0 likes · 20 min read

10 Real GPT-5.6 Cases: From Voxel Manhattan to Google Earth Clone

Programmer DD

Jun 23, 2026 · Artificial Intelligence

Beyond Code Generation: AI Agents Add Security Fixes, Cross‑Language Collaboration, and Long‑Running Task Support

Recent announcements from OpenAI, GitHub, Google, and Cloudflare show AI agents transitioning from simple code generation to enterprise‑ready tools that incorporate security‑closed loops, protocol‑defined cross‑language cooperation, persistent context for long‑running work, and transparent cost and debugging information.

AI agentsCloud ComputingSecurity Automation

0 likes · 14 min read

Beyond Code Generation: AI Agents Add Security Fixes, Cross‑Language Collaboration, and Long‑Running Task Support

phodal

May 31, 2026 · Artificial Intelligence

Long-Run Verification: Converging AI Agents from Continuous Execution to Engineering

The article analyses experiments with Claude Code dynamic workflows and a 50‑hour timetravel‑agent prototype, exposing how long‑running AI coding tasks drift without proper verification gates and proposing a four‑step gate framework to ensure convergence, evidence collection, and reliable engineering outcomes.

AI agentsagent orchestrationdynamic workflows

0 likes · 10 min read

Long-Run Verification: Converging AI Agents from Continuous Execution to Engineering

Architect

May 15, 2026 · Artificial Intelligence

Why Codex, Claude Code, and Hermes All Adopt /goal: Turning Prompt Goals into Runtime Agent Interfaces

From late April to mid‑May, OpenAI Codex, Claude Code, and Hermes each introduced an explicit /goal capability that transforms a one‑sentence prompt into a managed runtime object, enabling long‑running agents to maintain state, validation, budget, and pause/resume control within the Agent Harness.

AI agentsAgent HarnessClaude Code

0 likes · 21 min read

Why Codex, Claude Code, and Hermes All Adopt /goal: Turning Prompt Goals into Runtime Agent Interfaces

Architect

May 14, 2026 · Artificial Intelligence

Why Codex /goal Goes Beyond Simple Looping for Long‑Running Agents

The article dissects Codex’s /goal feature, showing how it adds persistent goal objects, a runtime lifecycle, completion auditing and budget handling, turning long‑running agents from a simple repeat‑loop into a robust, state‑driven engineering workflow.

Agentic EngineeringBudget controlCodex

0 likes · 20 min read

Why Codex /goal Goes Beyond Simple Looping for Long‑Running Agents

phodal

May 10, 2026 · Artificial Intelligence

From /goal to Long‑Running Asynchronous Agents: Making AI Sustainably Deliver Complex Tasks

By experimenting with OpenAI’s /goal feature, the author shows how to turn ad‑hoc AI prompts into a structured, long‑running loop that records progress in Git, README and test artifacts, enabling agents to handle complex engineering tasks across multiple sessions with clear checkpoints and human‑in‑the‑loop control.

AI agentsPrompt engineeringRalph Loop

0 likes · 12 min read

From /goal to Long‑Running Asynchronous Agents: Making AI Sustainably Deliver Complex Tasks

AI Waka

Apr 28, 2026 · Artificial Intelligence

Why Single-Agent AI Fails: Anthropic’s Multi-Agent Harness for Long-Running Tasks

The article explains that single‑agent AI collapses on long‑running tasks due to compound error probabilities, outlines four structural failure modes, and presents Anthropic’s three‑agent GAN‑style harness—Planner, Generator, Evaluator—detailing sprint contracts, primitives, token economics, and three real‑world case studies that demonstrate dramatically higher reliability and productivity.

AI HarnessAgentic OpsAnthropic

0 likes · 26 min read

Why Single-Agent AI Fails: Anthropic’s Multi-Agent Harness for Long-Running Tasks

AI Tech Publishing

Apr 15, 2026 · Artificial Intelligence

8 Critical Harness Design Issues That Threaten Long‑Running Agent Accuracy

The article systematically breaks down why autonomous agents lose control during long‑running engineering tasks—missing context, short‑sighted planning, context anxiety, and plan drift—and shows how a well‑designed harness layer can preempt these problems without changing the underlying model.

AI engineeringAutonomous AgentsContext Management

0 likes · 11 min read

8 Critical Harness Design Issues That Threaten Long‑Running Agent Accuracy

AsiaInfo Technology: New Tech Exploration

Apr 1, 2026 · Industry Insights

How Harness Engineering Is Redefining Industrial AI Agents

This article analyzes the emergence of Harness Engineering as the third‑generation AI engineering paradigm, explains its three‑layer Industrial Harness architecture, identifies three failure modes of long‑running industrial agents, and validates the approach with quantitative case studies and a roadmap for Physical AI OS deployment.

AI engineeringIndustrial Agentsharness engineering

0 likes · 28 min read

How Harness Engineering Is Redefining Industrial AI Agents

Black & White Path

Mar 29, 2026 · Industry Insights

GitHub’s Agent Legion Tops the 2026 Productivity Leaderboard

The 2026 GitHub Agent leaderboard showcases five standout multi‑agent frameworks—last30days‑skill, oh‑my‑claudecode, dexter, RuView, and deer‑flow—highlighting trends toward long‑running tasks, coordinated AI teams, and cross‑modal sensing beyond cameras.

AI agentsGitHub ProjectsMulti-agent systems

0 likes · 7 min read

GitHub’s Agent Legion Tops the 2026 Productivity Leaderboard

Architect

Mar 26, 2026 · Artificial Intelligence

How Anthropic’s Harness Keeps Long‑Running AI Agents on Track

The article analyzes Anthropic’s Harness design for long‑running applications, detailing how it mitigates context anxiety and self‑evaluation bias through sprint contracts, rubric scoring, and a planner‑generator‑evaluator architecture, and evaluates its effectiveness across multiple versions.

AI agentsContext Managementarchitectural design

0 likes · 13 min read

How Anthropic’s Harness Keeps Long‑Running AI Agents on Track

Architect

Feb 27, 2026 · Artificial Intelligence

Turning AI Agents into Deliverable Workflows: Skills, Shell, and Compaction Explained

The article explains why writing code alone does not guarantee delivery, outlines three core challenges for long‑running agents—process reuse, execution, and context continuity—and presents a practical framework of Skills, Shell, and Compaction together with ten actionable recommendations, security guidelines, and implementation steps for teams.

AI agentsCompactionShell

0 likes · 18 min read

Turning AI Agents into Deliverable Workflows: Skills, Shell, and Compaction Explained

AI Insight Log

Jan 18, 2026 · Artificial Intelligence

8 Actionable Practices from Cursor’s Week‑Long, Million‑Line Coding Experiment

Cursor ran a team of AI coding agents for a week to build a prototype browser, uncovering three major failure modes—drift, collaboration breakdown, and lack of quality signals—and proposing a planner/worker split plus eight concrete tactics that ordinary developers can adopt for long‑running autonomous coding tasks.

AI agentsCursorautomation

0 likes · 10 min read

8 Actionable Practices from Cursor’s Week‑Long, Million‑Line Coding Experiment