Microsoft’s Open‑Source SkillOpt Supercharges AI Agent Skills, Surpasses 5K GitHub Stars
SkillOpt, an open‑source framework from Microsoft Research, treats skill markdown files as trainable parameters and applies neural‑network optimization techniques across six ReflACT stages, achieving up to 39‑point accuracy gains on 52 benchmark evaluations and demonstrating cross‑model transferability, all while requiring zero inference cost.
SkillOpt is an open‑source project from Microsoft Research that treats the agent skill file Skill.md as trainable parameters and applies the disciplined training pipeline of neural networks to improve skill quality.
Mapping of Neural‑Network Concepts to Skill Optimization
Weights → Skill.md Gradients → task‑trajectory‑based reflective analysis
Learning rate → budget for each text edit (text learning rate)
Validation set → held‑out data scoring gate
Epoch → multiple rounds of iterative optimization
ReflACT Six‑Stage Pipeline (repeated for many epochs)
Rollout – Run the target model with the current skill on a batch of tasks, collecting execution trajectories and scores.
Reflective Analysis – An optimizer model examines the trajectories, identifying which parts of the skill cause errors and which succeed.
Patch Generation – The optimizer produces concrete text edits (add, delete, replace). Each edit’s magnitude is limited by the text learning rate to ensure incremental changes.
Merge – Combine multiple patches into a single candidate skill document.
Rank & Filter – If the total number of edits exceeds a predefined budget, rank them by importance and keep only the most critical.
Validation Gate – Evaluate the candidate skill on a held‑out validation set; accept it only if its score strictly exceeds that of the current skill, otherwise discard the changes.
Two epoch‑level global mechanisms further refine training:
Slow Update – After each epoch, review the entire training experience to extract high‑level improvement suggestions and inject them into the skill.
Meta Skill – Build on Slow Update to derive a higher‑level strategic guide that makes subsequent epochs more effective.
Experimental Evaluation
SkillOpt was evaluated on six benchmarks covering 52 evaluation units, using seven target models (including GPT‑5.5, GPT‑5.4, GPT‑5.4‑nano) and three execution modes (direct dialogue, Codex CLI loop, Claude Code CLI loop). Results:
Best or tied‑best score on every unit.
Average accuracy improvement on GPT‑5.5: +23.5 points (direct dialogue), +24.8 points (Codex loop), +19.1 points (Claude Code loop).
Maximum single‑scenario gain: +39.0 points.
Outperformed prompt‑optimization baselines (TextGrad, GEPA), skill‑evolution methods (Trace2Skill, EvoSkill), human‑written skills, and one‑shot strong‑model skills.
Optimized skills transfer across models and execution modes (e.g., a skill trained on GPT‑5.5 improves GPT‑5.4, and a skill optimized for Codex works on Claude Code).
Usage
Installation:
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .Configure backend via environment variables (example for Azure OpenAI):
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-key"Start training (example for the SearchQA benchmark):
python scripts/train.py \
--config configs/searchqa/default.yaml \
--split_dir /path/to/your/searchqa_split \
--azure_openai_endpoint https://your-resource.openai.azure.com/ \
--optimizer_model gpt-5.5 \
--target_model gpt-5.5After training, best_skill.md (typically 300–2000 tokens) is produced and can be fed directly to the model at inference time with zero additional cost.
Evaluation‑only mode (no retraining):
python scripts/eval_only.py \
--config configs/searchqa/default.yaml \
--skill ckpt/searchqa/gpt5.5_skill.md \
--split valid_unseen \
--split_dir /path/to/searchqa_split \
--azure_openai_endpoint https://your-resource.openai.azure.com/Pre‑trained skill files for GPT‑5.5 are provided in the ckpt/ directory.
Extensibility
The architecture is modular:
Adding a new benchmark requires implementing a dataloader, a rollout function, and an initial skill seed.
Adding a new backend involves writing a backend module and registering it. Reference implementations for Azure OpenAI, Claude, Qwen, MiniMax, Codex CLI, and Claude Code CLI are included.
Resources
GitHub repository: https://github.com/microsoft/SkillOpt
Paper (arXiv): https://arxiv.org/abs/2605.23904
Project homepage: https://microsoft.github.io/SkillOpt/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
