How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention
OpenMythos re‑creates the Claude Mythos architecture as a Recurrent‑Depth Transformer with MoE routing, achieving comparable performance to larger Transformers while using roughly half the parameters, and demonstrates systematic generalization and depth extrapolation through looped inference in latent space.
Core Design of the Recurrent‑Depth Transformer (RDT)
Same weight set can be executed up to 16 times.
Each pass follows a different expert path.
Inference runs entirely in the hidden‑state latent space, producing an answer only after the final loop.
Mixture‑of‑Experts routing
The MoE component follows the DeepSeekMoE design: a large number of fine‑grained routing experts combined with a small pool of always‑online shared experts, providing breadth of domain knowledge across loops.
LTI stable loop injection
Stability is ensured by the Linear Time‑Invariant (LTI) stable loop injection described in the Parcae paper (UCSD & Together AI), which prevents divergence of each recurrent iteration.
Empirical results
Experiments show a 770 M‑parameter RDT matches the performance of a standard 1.3 B‑parameter Transformer, reducing parameter count by nearly 50 % without loss of accuracy.
Systematic generalization
Reproducing a study from Ohio State University, the RDT correctly answered queries involving knowledge combinations never seen during training, whereas a conventional Transformer failed, demonstrating systematic generalization.
Depth extrapolation
When trained on 20‑step reasoning chains and tested on 30‑step chains, the RDT handled the longer chains by adding extra loops, while the standard Transformer collapsed, indicating effective depth extrapolation.
Implications
These findings suggest that the bottleneck for large language models lies in knowledge composition rather than raw parameter count, and that future scaling may prioritize deeper inference loops over ever‑larger models.
Resources
GitHub: https://github.com/kyegomez/OpenMythos#the-central-hypothesis Reference 1: https://x.com/KyeGomezB/status/2045660378844024994 Reference 2: https://arxiv.org/abs/2604.07822 Reference 3: https://arxiv.org/abs/2604.12946Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
