OpenMythos: 22‑Year‑Old Recreates Claude Mythos with Recurrent Depth Transformers

A 22‑year‑old developer reverse‑engineered Anthropic’s confidential Claude Mythos, releasing the OpenMythos project that employs a Recurrent Depth Transformer looping a single weight set up to 16 times, matching a 1.3 B‑parameter transformer’s performance with only 770 M parameters while enabling deeper inference and solving gradient instability.

AI Architecture Path
AI Architecture Path
AI Architecture Path
OpenMythos: 22‑Year‑Old Recreates Claude Mythos with Recurrent Depth Transformers

Background

Anthropic’s Claude Mythos is a secret model that never appears in papers, is not publicly released, and is only tested within a technical alliance of about 40 companies. It is claimed to autonomously execute a 32‑step enterprise network attack chain, achieve a 73% success rate on expert‑level CTF challenges, and complete tasks that would take human experts 20 hours in a single step.

Open‑source Reconstruction

At 22 years old, open‑source developer Kye Gomez used only publicly available research and first‑principles reasoning—without internal access or leaked documentation—to reverse‑engineer the black‑box architecture and release the OpenMythos project. The repository quickly attracted attention, gaining nearly 7 000 stars in four days and surpassing 12 000 stars later.

Recurrent Depth Transformer (RDT)

Gomez introduced a Recurrent Depth Transformer (RDT) that shifts the scaling paradigm from “more parameters, more layers, more compute” to “use time for depth, use recurrence for efficiency.” The core idea is to run the same weight set up to 16 times within a single forward pass.

This is not redundant repetition; each iteration deepens the reasoning based on the previous result, akin to silently rehearsing a sentence 16 times before speaking, thereby saving parameters while increasing depth.

Empirical data shows that a 770 M‑parameter RDT matches the performance of a standard 1.3 B‑parameter Transformer, halving the parameter count with comparable effectiveness and substantially reducing training and deployment costs.

Three‑Stage Architecture

Prelude : a standard Transformer layer that encodes the raw input.

Recurrent Block : the same weights iterate up to 16 rounds, each round updating the hidden state with the formula hₜ₊₁ = A·hₜ + B·e + Transformer(hₜ, e), where the original input e is re‑injected at every step to prevent drift.

Coda : a final standard Transformer layer that produces the output.

Mixture‑of‑Experts + Recurrence

The recurrent block is combined with a Mixture‑of‑Experts (MoE) routing scheme inspired by DeepSeek‑MoE. Each token activates a small subset of experts, sharing general knowledge while sparse activation saves memory. The router selects different expert subsets each round, allowing the same weights to follow diverse paths.

Stability Mechanisms

LTI Constraint Injection : enforces a spectral radius strictly less than 1, mathematically preventing divergence.

Adaptive Computation Time (ACT) : lets the model decide when to stop thinking.

Depth‑wise LoRA Adapters : apply independent fine‑tuning at each iteration.

Capability Breakthroughs

Deep Extrapolation : The model is trained on a 20‑step inference chain but can be tested on 30 steps. Standard Transformers collapse, whereas the recurrent model remains stable after a few extra loops.

Systematic Generalization : Even when presented with unseen knowledge combinations, the model answers accurately, indicating that the bottleneck is not “how much it knows” but “how well it can recombine known facts.” Recurrence unlocks this combinatorial ability without extra parameters.

Controversies and Reality Check

Code compiles and the architecture is logically consistent.

Provides seven model scales ranging from 1 B to 1 T parameters.

Supports one‑click pip installation for immediate use.

However, the project currently lacks trained weights, official benchmark data, and concrete inference demos. Gomez acknowledges that the work is a theoretical reconstruction meant for reference; it cannot be confirmed that it perfectly reproduces Claude Mythos, but it offers a falsifiable, implementable direction.

Industry Implications

AI Competition Rule Shift : Moves focus from “more parameters, more GPUs, more money” to “inference depth = time for space,” suggesting future top models may be those that think more rather than those that are larger.

Closed‑Source Barrier Erosion : A single 22‑year‑old developer can rebuild a top‑tier closed‑source architecture, indicating that information monopoly is giving way to innovative engineering.

Open‑Source Community Acceleration : Community‑driven projects can break through high‑cost barriers, opening low‑cost R&D avenues for smaller teams.

OpenMythos demonstrates that replacing stacking with recurrence, depth with time, and space with computation is feasible and efficient, lowering the entry barrier for advanced AI development.

https://github.com/kyegomez/OpenMythos
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AITransformeropen sourceClaude MythosOpenMythosRecurrent Depth Transformer
AI Architecture Path
Written by

AI Architecture Path

Focused on AI open-source practice, sharing AI news, tools, technologies, learning resources, and GitHub projects.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.