Ornith-1.0: The New Open‑Source Agentic Coding King with MIT License

Ornith-1.0, an open‑source model family released under the MIT license, tops multiple Agentic Coding benchmarks (SWE‑Bench Verified 82.4, Terminal‑Bench 77.5, etc.), spans from 9B to 397B parameters, and introduces joint reinforcement‑learning optimization of scaffold and solution to reshape AI‑assisted programming.

IT Services Circle
IT Services Circle
IT Services Circle
Ornith-1.0: The New Open‑Source Agentic Coding King with MIT License

Release and benchmark performance

Ornith‑1.0 was released under the MIT license and achieved the highest publicly reported scores on six Agentic Coding benchmarks:

SWE‑Bench Verified: 82.4

SWE‑Bench Pro: 62.2

Terminal‑Bench 2.1: 77.5

NL2Repo: 48.2

SWE Atlas QnA: 41.2

ClawEval: 77.1

These numbers exceed the scores of most closed‑source agents that have not disclosed verifiable results.

Model family and deployment options

Four variants cover the full parameter spectrum:

9B Dense – runs on consumer‑grade GPUs

31B Dense – fits on a single server for small teams

35B MoE – balances efficiency and throughput for medium projects

397B MoE – targets enterprise‑level private deployment

All variants are fine‑tuned on top of Gemma 4 and Qwen 3.5, released in GGUF format and supporting local deployment.

Joint scaffold‑solution optimization

Traditional coding agents separate two layers:

Task scaffold – planning, tool calling, context management.

Final solution – code generation, bug fixing, test execution.

In most pipelines the scaffold is handcrafted by engineers and only the solution layer is trained. Ornith‑1.0 applies reinforcement learning (RL) to a single training loop that optimizes **both** scaffold and solution simultaneously. This lets the model discover more effective execution frameworks rather than adapting to a fixed, human‑designed scaffold.

Empirically, the joint optimization improves performance across all six benchmarks, demonstrating that the model can autonomously redesign its own workflow and achieve higher task success rates.

Interpretation of benchmark scores

SWE‑Bench Verified (82.4) evaluates a model on real GitHub issues, requiring automatic code location, modification, and passing of unit tests. Human developers typically achieve 70‑75 % on this benchmark; Ornith‑1.0 surpasses that range.

SWE‑Bench Pro (62.2) adds multi‑file changes, cross‑module refactoring, and complex dependencies, indicating capability beyond simple one‑line fixes.

Terminal‑Bench 2.1 (77.5) measures interaction with a real terminal (e.g., cd, ls, editing config files, debugging services). The score shows the model can operate in an actual shell environment rather than a simulated one.

NL2Repo (48.2) requires generating a complete GitHub repository from a natural‑language description (e.g., “build a task‑management app”). This is the highest open‑source score reported for this task.

Open‑source vs. closed‑source landscape

Closed‑source agents such as Claude Code, GPT‑5.5 + Codex, and Gemini + Code Assist rely on proprietary models and often lack publicly verifiable benchmark results. Ornith‑1.0 provides fully disclosed scores, enabling independent replication.

The open‑source community now offers models that match or exceed the performance of these commercial agents while remaining free for commercial use under the MIT license.

Emerging signals

1 – Agentic coding models are becoming commoditized

A 9B model already delivers agent‑level coding ability, and GGUF builds allow execution on a MacBook, eliminating scarcity of the model itself.

2 – Joint scaffold‑solution optimization may become the new paradigm

Ornith‑1.0 is among the few models that jointly optimize scaffold and solution at scale and validate the approach on multiple benchmarks. This suggests future AI‑coding tools will let models design their own execution frameworks, creating an exponential iteration loop driven by RL feedback.

3 – Open‑source gains a structural advantage in the agent era

Agentic coding hinges on tool invocation, multi‑step planning, and environment interaction—core engineering problems that thrive in open‑source collaboration. A large developer community can iteratively improve agent frameworks beyond the capacity of any single closed‑source team.

Implications

When a fully open‑source, high‑performing coding agent is freely available, competition may shift from “who has the stronger model” to “who builds the larger ecosystem.”

Benchmark comparison chart
Benchmark comparison chart
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

open sourcebenchmarkreinforcement learningagentic codingAI coding agentsOrnith-1.0
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.