Turning Harness into a Distributed Context Management System for Long‑Task Agents
The article explains why the reliability of long‑task agents now hinges on harness design rather than model strength, and details four harness innovations—programmatic tool calls, sub‑agents as isolation boundaries, context compression, and skill‑search priority—that Glean uses to build a distributed context management system.
Why Harness Matters
Long‑task agent reliability is increasingly determined by how the harness manages context, not just by model capability. LangChain’s harness upgrade raised its Terminal‑Bench score by 13.7 points, and Vercel removed 80% of its tools yet achieved higher reliability and 3.5× lower latency, showing that performance gains stem from harness improvements.
Driving Harness Evolution
Agents are now expected to handle longer, multi‑step tasks, exposing the limits of the classic pattern that packs >20k tokens of instructions and 20‑40 tool schemas into a single prompt. Glean migrated these instructions into on‑demand loaded skills, cutting the system prompt by over 45%, which becomes evident in complex tasks such as completing a 200‑question RFP that requires extensive multi‑source research and verification.
Four Harness Design Changes
Programmatic Tool Calls (PTC) in a sandbox – Workflow logic is moved into sandboxed Python code, exposing tools as callable functions. This allows dozens of tool calls to execute in a single run, reducing latency, improving reliability (stable loops, filters, branches), ensuring consistency via reusable skill scripts, and keeping only summaries or final outputs in the orchestrator’s context.
Sub‑agents as isolation boundaries – The orchestrator can launch many sub‑agents in parallel, each with its own context window and token budget. This isolates work on independent objects (e.g., hundreds of customers or thousands of tickets), deepening analysis without overloading a single context and echoing the idea from Recursive Language Models that high‑level agents delegate bounded tasks to lower‑level workers.
Context compression – A two‑layer approach preserves essential state while discarding excess data. The first layer compresses dialogue into user intent, attempted actions, successes/failures, and next steps. The second layer compresses large tool outputs by storing them in the sandbox file system and summarizing them with file‑path references. This maintains semantic completeness across many rounds.
Skill search priority discovery – Skills are indexed and discovered in three steps: (1) the model queries the index when it recognizes a needed capability; (2) a short list with names, descriptions, and execution hints is presented; (3) the full schema or skill.md file is loaded only for the selected skill. This progressive disclosure limits context cost and improves selection quality.
Resulting Distributed Context Management
By combining sandboxed PTC, isolated sub‑agents, two‑layer compression, and indexed skill discovery, Glean’s harness functions as a distributed context management system. It scales to larger, more reliable enterprise AI tasks, and each new generation of work‑type AI will continue to expose and drive further harness evolution.
References
LangChain blog on harness engineering: https://www.langchain.com/blog/improving-deep-agents-with-harness-engineering
Vercel blog on removing 80% of agent tools: https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools
Recursive Language Models paper: https://arxiv.org/abs/2512.24601v1
Skill schema reference: http://skill.md/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Tech Publishing
In the fast-evolving AI era, we thoroughly explain stable technical foundations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
