Microsoft’s Open‑Source SkillOpt Supercharges AI Agent Skills, Surpasses 5K GitHub Stars

SkillOpt, an open‑source framework from Microsoft Research, treats skill markdown files as trainable parameters and applies neural‑network optimization techniques across six ReflACT stages, achieving up to 39‑point accuracy gains on 52 benchmark evaluations and demonstrating cross‑model transferability, all while requiring zero inference cost.

IT Services Circle
IT Services Circle
IT Services Circle
Microsoft’s Open‑Source SkillOpt Supercharges AI Agent Skills, Surpasses 5K GitHub Stars

SkillOpt is an open‑source project from Microsoft Research that treats the agent skill file Skill.md as trainable parameters and applies the disciplined training pipeline of neural networks to improve skill quality.

Mapping of Neural‑Network Concepts to Skill Optimization

Weights → Skill.md Gradients → task‑trajectory‑based reflective analysis

Learning rate → budget for each text edit (text learning rate)

Validation set → held‑out data scoring gate

Epoch → multiple rounds of iterative optimization

ReflACT Six‑Stage Pipeline (repeated for many epochs)

Rollout – Run the target model with the current skill on a batch of tasks, collecting execution trajectories and scores.

Reflective Analysis – An optimizer model examines the trajectories, identifying which parts of the skill cause errors and which succeed.

Patch Generation – The optimizer produces concrete text edits (add, delete, replace). Each edit’s magnitude is limited by the text learning rate to ensure incremental changes.

Merge – Combine multiple patches into a single candidate skill document.

Rank & Filter – If the total number of edits exceeds a predefined budget, rank them by importance and keep only the most critical.

Validation Gate – Evaluate the candidate skill on a held‑out validation set; accept it only if its score strictly exceeds that of the current skill, otherwise discard the changes.

Two epoch‑level global mechanisms further refine training:

Slow Update – After each epoch, review the entire training experience to extract high‑level improvement suggestions and inject them into the skill.

Meta Skill – Build on Slow Update to derive a higher‑level strategic guide that makes subsequent epochs more effective.

Experimental Evaluation

SkillOpt was evaluated on six benchmarks covering 52 evaluation units, using seven target models (including GPT‑5.5, GPT‑5.4, GPT‑5.4‑nano) and three execution modes (direct dialogue, Codex CLI loop, Claude Code CLI loop). Results:

Best or tied‑best score on every unit.

Average accuracy improvement on GPT‑5.5: +23.5 points (direct dialogue), +24.8 points (Codex loop), +19.1 points (Claude Code loop).

Maximum single‑scenario gain: +39.0 points.

Outperformed prompt‑optimization baselines (TextGrad, GEPA), skill‑evolution methods (Trace2Skill, EvoSkill), human‑written skills, and one‑shot strong‑model skills.

Optimized skills transfer across models and execution modes (e.g., a skill trained on GPT‑5.5 improves GPT‑5.4, and a skill optimized for Codex works on Claude Code).

Usage

Installation:

git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .

Configure backend via environment variables (example for Azure OpenAI):

export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-key"

Start training (example for the SearchQA benchmark):

python scripts/train.py \
    --config configs/searchqa/default.yaml \
    --split_dir /path/to/your/searchqa_split \
    --azure_openai_endpoint https://your-resource.openai.azure.com/ \
    --optimizer_model gpt-5.5 \
    --target_model gpt-5.5

After training, best_skill.md (typically 300–2000 tokens) is produced and can be fed directly to the model at inference time with zero additional cost.

Evaluation‑only mode (no retraining):

python scripts/eval_only.py \
    --config configs/searchqa/default.yaml \
    --skill ckpt/searchqa/gpt5.5_skill.md \
    --split valid_unseen \
    --split_dir /path/to/searchqa_split \
    --azure_openai_endpoint https://your-resource.openai.azure.com/

Pre‑trained skill files for GPT‑5.5 are provided in the ckpt/ directory.

Extensibility

The architecture is modular:

Adding a new benchmark requires implementing a dataloader, a rollout function, and an initial skill seed.

Adding a new backend involves writing a backend module and registering it. Reference implementations for Azure OpenAI, Claude, Qwen, MiniMax, Codex CLI, and Claude Code CLI are included.

Resources

GitHub repository: https://github.com/microsoft/SkillOpt

Paper (arXiv): https://arxiv.org/abs/2605.23904

Project homepage: https://microsoft.github.io/SkillOpt/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAI agentsopen sourceGitHubBenchmarkingSkillOptNeural optimization
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.