Training Only the Skill Document While Keeping Model Weights Frozen (SkillOpt)
Microsoft Research introduces SkillOpt, a method that freezes large‑model weights and instead trains a natural‑language skill document as the sole learnable parameter, using a rollout‑reflect‑edit‑gate loop, achieving optimal results across 52 benchmark‑model‑environment combinations and demonstrating strong transferability.
Core Idea: Treat the Skill Document as Trainable Parameters
The large model and its agent remain frozen; the only mutable component is the skill.md file. SkillOpt translates the entire deep‑learning training pipeline into text space: rollout corresponds to forward pass, reflect to backward pass, edit budget to learning rate, and it includes mini‑batch, epoch, momentum, and slow‑update mechanisms.
Training Loop
Rollout : The target model executes tasks using the current skill, recording trajectories with scores.
Reflect : A separate optimizer model examines successful and failed batches to discover reusable patterns.
Edit : Candidate edits (add, delete, replace) are generated under an edit‑budget constraint.
Gate : Edits are accepted only if they improve performance on a held‑out validation set.
Stability Design
Key mechanisms include an edit budget to prevent a single rewrite from erasing good rules, a buffer that stores rejected edits as negative feedback, and slow updates plus a meta‑skill optimizer at the end of each epoch to provide long‑term signals. Deployment uses only the final skill document, incurring no extra inference overhead.
Results: 52/52 Combinations Win
Across six benchmarks (SearchQA, SpreadsheetBench, OfficeQA, DocVQA, LiveMath, ALFWorld), seven target models (GPT‑5.5/5.4/5.4‑mini/5.4‑nano/5.2, Qwen3.5‑4B, Qwen3.6‑35B‑A3B), and three execution environments (direct dialogue, Codex, Claude Code), SkillOpt achieved the best or tied‑best score in every one of the 52 settings, outperforming baselines such as Human skill, one‑shot LLM skill, Trace2Skill, TextGrad, GEPA, and EvoSkill.
Notable gains include: GPT‑5.5 improves by 23.5 points in direct dialogue, 21.8 points with Codex, and 18.6 points with Claude Code; the small GPT‑5.4‑nano model gains 35.1 points on ALFWorld.
Transferability
Skills trained on GPT‑5.4 for LiveMath increase GPT‑5.4‑nano performance by +15.2.
SpreadsheetBench skills learned with Codex raise Claude Code scores by +31.8.
When GPT‑5.4‑nano acts as its own optimizer, SpreadsheetBench improves by +10.4.
The exported best_skill.md is a reusable artifact that is not tied to any specific model or harness.
Additional Details
To avoid over‑fitting, training, validation, and test splits are disjoint; for SearchQA the split follows a 2:1:7 ratio, reserving 70 % of data for final evaluation, and cross‑model, cross‑harness, and cross‑benchmark transfer experiments further validate robustness.
A concurrent paper with the same name was submitted to the Agent Skills 2026 workshop; the authors note the original draft was titled “Skill as LoRA”, treating the skill as a LoRA‑style PEFT module.
Future work aims to package SkillOpt as an easy‑to‑use agent‑learning framework comparable to MMDetection or Detectron in computer vision.
Getting Started
The code is open‑source on GitHub and requires Python 3.10+. Quick start:
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .
# Optional ALFWorld benchmark support
pip install -e ".[alfworld]"
alfworld-downloadConfigure your API key (Azure OpenAI, native OpenAI, Anthropic Claude, or local vLLM deployments are supported, with Azure OpenAI recommended):
cp .env.example .env
# Edit .env to add your API key, then
source .envPrepare data under train/, val/, and test/ directories following the JSON schema defined in skillopt/envs/<benchmark>/dataloader.py. Supported benchmarks include SearchQA, ALFWorld, DocVQA, LiveMathematicianBench, SpreadsheetBench, and OfficeQA.
Example training command for SearchQA:
python scripts/train.py \
--config configs/searchqa/default.yaml \
--split_dir /path/to/your/searchqa_split \
--azure_openai_endpoint https://your-resource.openai.azure.com/ \
--optimizer_model gpt-5.5 \
--target_model gpt-5.5The system supports checkpoint resumption; re‑running the same command continues from the last completed step. After training, the best skill document appears as best_skill.md in the output directory.
outputs/<run_name>/
├── config.json
├── history.json
├── runtime_state.json
├── best_skill.md
├── skills/skill_vXXXX.md
├── steps/step_XXXX/
├── slow_update/epoch_XX/
└── meta_skill/epoch_XX/Evaluation‑only mode:
# Evaluate on test split only
python scripts/eval_only.py \
--config configs/searchqa/default.yaml \
--skill outputs/my_run/best_skill.md \
--split valid_unseen \
--split_dir /path/to/searchqa_split \
--azure_openai_endpoint https://your-resource.openai.azure.com/An optional WebUI can be launched for monitoring:
pip install -e ".[webui]"
python -m skillopt_webui.app --shareThe default port is 7860; it can be changed, and a public share link can be created.
Links
Project page: https://microsoft.github.io/SkillOpt/
Paper: https://arxiv.org/abs/2605.23904
Code: https://github.com/microsoft/SkillOpt
Demo video: https://youtu.be/JUBMDTCiM0M
As AI agents shift from assistant to worker roles, the bottleneck moves from knowledge to procedural capability—how to use tools, inspect intermediate states, and recover from failures. Explicitly writing these capabilities as trainable, inspectable, and transferable skill documents may be more engineering‑friendly than embedding them directly in model weights.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
