What Fable 5’s Leaked Chain‑of‑Thought Reveals About AI’s “Neuralese”

A Reddit post exposing Claude Fable 5’s raw chain‑of‑thought shows garbled internal monologue, sparking debate about illegible reasoning, its prevalence across large language models, and the safety implications of AI potentially developing a private, unreadable language.

Machine Heart
Machine Heart
Machine Heart
What Fable 5’s Leaked Chain‑of‑Thought Reveals About AI’s “Neuralese”

Yesterday a Reddit post in r/ClaudeAI went viral, sharing a screenshot of Claude Fable 5’s unfiltered chain‑of‑thought while it struggled with a Codeforces problem (2239D). Instead of a clean answer the model emitted a stream of fragmented utterances such as “ DATA DATA DATA GO ”, “ GRR ”, “ GAAAH ”, “ PHEW ”, and “ I'M DROWNING — EMPIRICS!!! ”. The poster’s comment described the output as “not human‑like but oddly cute”.

Large language models typically perform an internal chain‑of‑thought before producing a polished answer; this reasoning is hidden from users, who only see the final, well‑crafted response. The leaked trace therefore offers a rare glimpse into the model’s raw, often incoherent reasoning process.

Anthropic’s system card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf) explicitly discusses “illegible reasoning”, presenting an extreme case where a model solving a card‑puzzle produces a dense mix of playing‑card symbols, arrows, skull emojis, and even a German profanity “verdammt”. The card notes that such illegible traces appear most frequently and severely in that task.

Similar phenomena have been reported for other models. DeepSeek’s R1‑Zero technical report (arXiv:2501.12948) describes mixed‑language, fragmented reasoning and notes that a subsequent supervised‑fine‑tuning “hot‑start” restored readability. OpenAI’s o3 model, according to third‑party safety assessments (arXiv:2509.15541), also inserts nonsensical word fragments into its chain‑of‑thought. A systematic evaluation of 14 leading reasoning models (METR, 2025) found that almost all outcome‑based RL models exhibit increasingly unreadable reasoning as model size and task difficulty grow, including Claude series.

This trend raises AI‑safety concerns: if chain‑of‑thought becomes opaque, monitoring tools that rely on model explanations may fail. The idea of models developing a private “Neuralese” language was first proposed by UC Berkeley researchers in 2017 (arXiv:1704.06960) and is now revisited as a potential risk.

However, a detailed analysis on LessWrong shows that the so‑called illegible trace is actually a highly compressed form of English combined with game‑notation, not a brand‑new language. Claude Haiku 4.5, with a different tokenizer, was able to reconstruct the underlying logic, supporting the hypothesis that dense reasoning does not equate to a truly secret language.

In summary, the Fable 5 incident is not an isolated quirk but a symptom of a broader side‑effect of outcome‑based reinforcement learning: models trade readability for token efficiency, producing “opaque reasoning” that challenges interpretability and safety research.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsChain-of-ThoughtAI safetyClaudeFable 5Illegible ReasoningNeuralese
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.