Artificial Intelligence 17 min read

Progressive Disclosure: Making Multi‑Skill LLM Agents Efficient and Scalable

This article examines the core challenge of giving large‑language‑model agents many abilities while keeping context size limited, compares three common loading strategies, introduces a progressive‑disclosure skill mechanism with three loading layers, and details its implementation, benefits, limitations, and suitable use cases in AgentScope‑Java.

Alibaba Cloud Developer

Feb 4, 2026

Progressive Disclosure: Making Multi‑Skill LLM Agents Efficient and Scalable

Background and challenge : In large‑language‑model (LLM) driven agent systems, developers want agents to possess many capabilities, yet at any moment only a small subset is needed and the context window is limited.

Using a customer‑service example, the article shows that users typically discuss only one or two domains (order query, refund, product recommendation, technical support) while the agent must be ready for all, creating a tension between breadth of knowledge and context efficiency.

Three common context‑loading schemes

Full loading – preload all domain knowledge into the system prompt. Advantages : simple, no extra mechanisms. Limitations : consumes >15k tokens, wasteful, poor scalability.

Multi‑Agent architecture – each agent loads knowledge for a specific domain. Advantages : isolates contexts. Limitations : each agent still loads its full domain knowledge; the overall system remains effectively full‑load.

RAG (Retrieval‑Augmented Generation) – dynamically retrieve knowledge via vector search. Advantages : flexible, on‑demand. Limitations : retrieval distortion, fragmented context, accuracy ceiling 70‑80%.

The root problem of all three approaches is the lack of a flexible context‑loading mechanism, which manifests in three dimensions:

Space dimension : cannot distinguish required from potential knowledge, leading to over‑loading.

Time dimension : only "load all" or "retrieve on demand" options exist.

Structure dimension : knowledge is either fully fragmented or fully loaded, with no middle ground.

An analogy likens the situation to an all‑round e‑commerce customer‑service representative who must memorize dozens of manuals at once, versus splitting into specialized reps or consulting a knowledge base.

Skill mechanism: progressive disclosure

Core idea : let the agent first know what skills exist, then learn how to use them only when needed, instead of stuffing everything into the context at start‑up.

A Skill is an independent, reusable knowledge and capability unit composed of three parts:

Structured instruction – a Markdown SOP that defines trigger conditions, step‑by‑step actions, and required tools/resources.

Resource files – reference documents, API specs, examples, templates.

Executable scripts – deterministic code for data processing, validation, or external system integration.

The directory layout is:

skill-name/
├── SKILL.md          # required metadata + instructions
├── references/        # optional detailed docs
│   └── api-doc.md
├── scripts/          # optional executable scripts
│   └── process.py
└── assets/           # optional templates/resources
    └── template.html

The minimal SKILL.md skeleton:

---
name: skill-name
description: when to use this skill
---

# Skill instruction content
...

Three‑layer progressive disclosure

Layer 1 – Metadata load (startup) : only the lightweight metadata of every registered skill is injected into the system prompt (≈100 tokens per skill). Example for ten skills: ~1k tokens.

Layer 2 – Instruction load (trigger time) : when the agent detects a user query that matches a skill (e.g., order status), it loads the full SOP for that skill (≈2k tokens) and executes the steps.

Layer 3 – Resource load (on‑demand) : if the SOP references an error‑code table or a script, the agent loads the specific resource (≈1k tokens) or runs the script without consuming additional context.

Token cost example for order processing:

# Context usage
metadata: ~100 tokens/skill × 10 = 1k
instruction: ~2k tokens (order_processing)
resource: ~1k tokens (error codes)
TOTAL ≈ 4k tokens

This approach saves ~85% of context compared with a full‑load of ~20k tokens while preserving SOP logical continuity.

Implementation in AgentScope‑Java

AgentScope‑Java provides a SkillBox to register skills, a SkillRepository abstraction to load skills from the file system (or future sources), and integrates skills with the ReAct agent via system‑prompt injection.

// Load skill from file system
AgentSkillRepository fileRepo = new FileSystemSkillRepository(Path.of("./skills"));
AgentSkill skill = fileRepo.getSkill("data_analysis");
skillBox.registerSkill(skill);

// Build agent with skill box
ReActAgent agent = ReActAgent.builder()
    .name("Assistant")
    .model(model)
    .skillBox(skillBox)
    .toolkit(toolkit)
    .build();

Skills can also bind Tool objects; the tool becomes visible only after the skill is activated, achieving a second‑ and third‑level disclosure.

// Register skill with bound tool
skillBox.registration()
    .skill(dataSkill)
    .tool(new DataAnalysisTool())
    .apply();

For deterministic operations, SkillBox.codeExecution() configures a working directory (often a Docker‑mounted volume) and a whitelist of executable extensions, enabling safe sandboxed execution.

// Enable code execution with Docker sandbox
skillBox.codeExecution()
    .workDir("/path/to/workdir")
    .includeFolders("scripts/", "assets/")
    .includeExtensions(".py", ".js", ".sh")
    .withShell(customShell)
    .enable();

Benefits

Isolates startup context, allowing unlimited skill registration.

Gives the model autonomy to decide which skill to load.

Reduces maintenance cost: updating a skill file updates behavior without retraining.

Suitable scenarios

Multi‑domain knowledge‑intensive applications (customer service, code assistants, medical advice).

Frequently iterated SOPs.

Tasks requiring deterministic, script‑driven steps.

Limitations

Only isolates the startup context; runtime context still contains all loaded skills, which may cause interference.

Skills have equal priority; no built‑in weighting for more important skills.

Trigger conditions rely on the LLM’s ability to recognise them, which varies across models.

Additional tool calls add ~100‑200 ms latency, making it unsuitable for ultra‑low‑latency use cases.

For simple single‑domain tasks or deep reasoning (e.g., mathematical proofs), a full prompt or long‑context approach may be more appropriate.

Future work

The authors plan to improve full lifecycle management, sharing, and distribution of skills to further lower creation and reuse costs.

References

Agent Skills guide: https://modelscope.github.io/agentscope-java/zh/task/agent-skill.html

Integrate Skills: https://agentskills.io/integrate-skills

Equipping Agents for the Real World with Agent Skills: https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills

Java LLM RAG Agent Context Management Progressive Disclosure Skill

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.