Artificial Intelligence 6 min read

Automatically Evolve Claude Code Skills: Open‑Source System That Strengthens AI Tools Over Time

The darwin-skill project introduces a ratchet‑based optimization loop that scores each Skill on eight dimensions, generates improvement proposals, commits changes, re‑scores, and only retains upgrades, with human confirmation between phases, enabling scalable maintenance of dozens of AI agent Skills.

Geek Labs

Apr 24, 2026

Automatically Evolve Claude Code Skills: Open‑Source System That Strengthens AI Tools Over Time

Pain Points

Agent Skill ecosystems have grown rapidly, with tools such as Claude Code, Codex, OpenClaw, Trae, and CodeBuddy supporting the SKILL.md format. Maintaining a small number of Skills is feasible, but managing 60+ Skills becomes difficult. Traditional Skill review checks only format, step numbering, and path accessibility, yet a perfectly formatted Skill may still produce poor results.

How Darwin‑skill Addresses the Problem

Inspired by Andrej Karpathy’s autoresearch, the system moves the autonomous experiment loop from model training to Skill optimization. The core mechanism is a Ratchet : scores can only increase; each iteration either improves the Skill or cleanly rolls back, preventing gradual degradation.

Process:

Identify the lowest‑scoring dimension.

Generate an improvement plan for that dimension.

Edit SKILL.md and commit via git.

A sub‑agent re‑scores the updated Skill.

If the new score exceeds the old score, keep the change; otherwise, revert the commit.

After each Skill is optimized, the system pauses, shows the diff and score change, and waits for user confirmation before proceeding to the next Skill.

Eight‑Dimension Evaluation System

The total score of 100 is split into two major blocks:

Structure (60 points) : assessed via static analysis, covering format compliance, path validity, and step completeness.

Effectiveness (40 points) : requires empirical testing; a Skill that looks good but performs poorly receives zero. The empirical performance dimension carries the highest weight (25 points).

Five Core Principles

Single Editable Asset : modify only one SKILL.md at a time, keeping variables controllable and improvements attributable.

Dual Evaluation : combine structural scoring (static analysis) with effect verification (run tests and check output).

Ratchet Mechanism : retain only improvements; automatically roll back regressions so scores never decrease.

Independent Scoring : use a sub‑agent for scoring to avoid self‑bias.

Human in the Loop : pause after each Skill optimization for user confirmation before continuing.

Five Stages of the Optimization Loop

The system runs autonomously within each stage but pauses between stages for human confirmation:

Phase 1: Assess current state and establish a baseline score.

Phase 2: Generate and execute an improvement plan.

Phase 3: Verify the effect of the improvement.

Phase 4: Ratchet decision – keep or revert the change.

Phase 5: User confirmation, then move to the next Skill.

Ratchet Mechanism Example

In a second round, a score of 75 fell below the current best of 78, triggering an automatic revert. The effective baseline remains locked at 78, and subsequent improvements build from that point. Scores can only ascend; regressions are fully eliminated.

How to Use

Installation command: npx skills add alchaincyf/darwin-skill After installation, invoke any Skill‑compatible Agent tool with commands such as “optimize all skills” or “optimize a specific skill.”

Conclusion

Design philosophy: create Skills like Nüwa, let Darwin evolve them. By retaining only improvements, time works in your favor.

GitHub: https://github.com/alchaincyf/darwin-skill

AI agents open-source Claude Code Darwin-skill Ratchet mechanism Skill optimization

Written by

Geek Labs

Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.