Artificial Intelligence 15 min read

Building LangChain Agent Skills from Scratch to Cut Token Usage and Boost Tool Accuracy

The article presents a step‑by‑step design and implementation of a Claude‑style Skills mechanism for LangChain agents, using a double‑layer tool architecture, state‑driven dynamic filtering, and middleware interception to load only relevant tools, dramatically reducing token consumption and improving decision quality and response speed.

Fun with Large Models

Feb 10, 2026

Building LangChain Agent Skills from Scratch to Cut Token Usage and Boost Tool Accuracy

When an agent is equipped with dozens of tools—data analysis, PDF handling, image recognition, code generation, etc.—registering all of them at once causes token bloat, poorer decision quality, and slower responses. Inspired by Anthropic's Claude Skills, the author proposes a progressive loading strategy for LangChain that shows the model only the tools needed for the current task.

Core Design Idea

The goal is to ensure that, for each model invocation, the agent "sees" only the most relevant tools. For example, a request to analyze sales data should expose only data‑analysis tools, while a PDF summarization request should hide those and expose only PDF‑processing tools.

Double‑Layer Tool Architecture

All tools are split into two tiers:

Loader Tools (facade tools) that are always visible and whose sole purpose is to activate a Skill.

Content Tools that implement the actual functionality and become visible only after their Skill is activated.

Example tools:

calculate_statistics   # compute statistics
generate_chart          # create charts
pdf_to_csv              # convert PDF to CSV
pdf_to_markdown         # convert PDF to Markdown

These are grouped into two Skill categories: data_analysis and pdf_processing. Loader tools such as load_data_analysis_skill and load_pdf_processing_skill activate the corresponding Skill and update a runtime state variable skills_loaded.

State‑Driven Dynamic Filtering

The middleware reads skills_loaded before each model call, filters the full tool list to keep only the Loader Tools plus the Content Tools belonging to the loaded Skills, and overrides the request's tool list.

def wrap_model_call(self, request, handler):
    # Step 1: read loaded skills
    skills_loaded = request.state.get("skills_loaded", [])
    # Step 2: get relevant tools
    relevant_tools = self.registry.get_tools_for_skills(skills_loaded)
    # Step 3: replace tool list
    filtered_request = request.override(tools=relevant_tools)
    # Step 4: continue chain
    return handler(filtered_request)

Middleware Hook Choice

LangChain provides before_model and wrap_model hooks. Because the implementation needs to replace the tool list and then let the rest of the chain run, wrap_model is the appropriate hook.

Key Implementation Details

Skill Definition Layer : each Skill inherits from BaseSkill and implements get_loader_tool() (returns a Loader Tool that updates skills_loaded) and get_tools() (returns its Content Tools).

Skill Registration Layer : SkillRegistry stores all Skills. Its get_tools_for_skills() method always returns every Loader Tool plus the Content Tools of the requested Skills, ensuring the agent can activate new Skills later.

Middleware Layer : a custom SkillMiddleware subclass of AgentMiddleware overrides wrap_model_call to perform the state‑driven filtering described above.

Execution Flow Example

1. Initial state: skills_loaded = []; the registry returns only the ten Loader Tools. 2. The model selects load_data_analysis_skill, which adds "data_analysis" to skills_loaded. 3. On the next call, the middleware fetches Loader Tools plus the Content Tools of the data_analysis Skill (e.g., calculate_statistics, generate_chart), giving the model a concise list of 14 tools. 4. The model now calls calculate_statistics to fulfill the user request. This two‑step process turns a "needle‑in‑a‑haystack" selection among 50 tools into two efficient, focused selections.

Conclusion

The presented architecture demonstrates how to replicate Claude‑style Skills in LangChain using a double‑layer tool design, a central state variable, and a wrap‑model middleware. The approach cuts token usage, improves tool‑calling accuracy, and reduces latency while keeping the agent flexible enough to load new Skills on demand.

Python LangChain Middleware Dynamic Loading Token Optimization Agent Skills

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.