What Fable 5’s Leaked Chain‑of‑Thought Reveals About AI’s “Neuralese”
A Reddit post exposing Claude Fable 5’s raw chain‑of‑thought shows garbled internal monologue, sparking debate about illegible reasoning, its prevalence across large language models, and the safety implications of AI potentially developing a private, unreadable language.
Yesterday a Reddit post in r/ClaudeAI went viral, sharing a screenshot of Claude Fable 5’s unfiltered chain‑of‑thought while it struggled with a Codeforces problem (2239D). Instead of a clean answer the model emitted a stream of fragmented utterances such as “ DATA DATA DATA GO ”, “ GRR ”, “ GAAAH ”, “ PHEW ”, and “ I'M DROWNING — EMPIRICS!!! ”. The poster’s comment described the output as “not human‑like but oddly cute”.
Large language models typically perform an internal chain‑of‑thought before producing a polished answer; this reasoning is hidden from users, who only see the final, well‑crafted response. The leaked trace therefore offers a rare glimpse into the model’s raw, often incoherent reasoning process.
Anthropic’s system card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf) explicitly discusses “illegible reasoning”, presenting an extreme case where a model solving a card‑puzzle produces a dense mix of playing‑card symbols, arrows, skull emojis, and even a German profanity “verdammt”. The card notes that such illegible traces appear most frequently and severely in that task.
Similar phenomena have been reported for other models. DeepSeek’s R1‑Zero technical report (arXiv:2501.12948) describes mixed‑language, fragmented reasoning and notes that a subsequent supervised‑fine‑tuning “hot‑start” restored readability. OpenAI’s o3 model, according to third‑party safety assessments (arXiv:2509.15541), also inserts nonsensical word fragments into its chain‑of‑thought. A systematic evaluation of 14 leading reasoning models (METR, 2025) found that almost all outcome‑based RL models exhibit increasingly unreadable reasoning as model size and task difficulty grow, including Claude series.
This trend raises AI‑safety concerns: if chain‑of‑thought becomes opaque, monitoring tools that rely on model explanations may fail. The idea of models developing a private “Neuralese” language was first proposed by UC Berkeley researchers in 2017 (arXiv:1704.06960) and is now revisited as a potential risk.
However, a detailed analysis on LessWrong shows that the so‑called illegible trace is actually a highly compressed form of English combined with game‑notation, not a brand‑new language. Claude Haiku 4.5, with a different tokenizer, was able to reconstruct the underlying logic, supporting the hypothesis that dense reasoning does not equate to a truly secret language.
In summary, the Fable 5 incident is not an isolated quirk but a symptom of a broader side‑effect of outcome‑based reinforcement learning: models trade readability for token efficiency, producing “opaque reasoning” that challenges interpretability and safety research.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
