14 min read

Reproducing Claude Fable 5 with Opus 4.8 and a Prompt: 90% Performance on Consumer GPUs

The article analyzes Claude Fable 5’s capabilities, dissects Anthropic’s official prompt guide, compares leaked system prompts, and demonstrates how to achieve roughly 90% of Fable 5’s performance on a consumer‑grade GPU using Opus 4.8 plus a custom prompt, while also presenting a local Gemma 4 12B coder alternative.

Old Zhang's AI Learning

Jun 15, 2026

Reproducing Claude Fable 5 with Opus 4.8 and a Prompt: 90% Performance on Consumer GPUs

Model capabilities

Claude Fable 5 (API model ID claude-fable-5) provides a 1 000 000‑token context window, a maximum output of 128 k tokens per request, and pricing of $10 per million input tokens and $50 per million output tokens. Compared with its predecessor Opus 4.8, Fable 5 shows measurable gains in:

Long‑term autonomy: maintains goal‑driven tasks across days and preserves instruction memory over long dialogues.

One‑shot problem solving: complex system implementations that previously required multiple iterations now succeed on the first pass.

Visual understanding: higher accuracy on dense technical images and web screenshots, with built‑in bash and cropping tools for blurry or rotated images.

Enterprise‑grade workflow output: improved quality for financial analysis, spreadsheets, presentations, and documents.

Code review and debugging: higher bug‑recall rates and ability to search across codebases and Git history.

Fuzzy navigation: autonomous decision‑making when handling complex, multi‑threaded requests.

Multi‑agent coordination: stronger scheduling and management of parallel sub‑agents.

Prompt engineering guide

The official guide introduces an effort parameter that balances intelligence, latency, and cost. Levels are: max: unlimited capability for frontier reasoning problems. xhigh: extended capability for tasks longer than 30 minutes. high (default): high capability for complex reasoning, difficult coding, and agent tasks. medium: balanced mode for agent tasks that need speed and performance. low: most efficient for simple tasks, sub‑agents, or lowest latency.

Key behaviors highlighted:

Instruction following : a short directive suffices; an example prompt forces the model to state the result first and defer details.

Progress audit : a prompt segment that requires the model to verify each claim before reporting dramatically reduces fabricated status updates.

Memory system : a simple Markdown file stores validated knowledge, with one file per experience, a top‑line summary, and explicit updates or deletions.

Adaptive thinking : thinking is always on; the display flag controls whether the reasoning chain is omitted ( display: "omitted") or summarized ( display: "summarized"). The raw chain never returns.

Scaffolding changes : start with the hardest task, make self‑validation explicit, refactor existing prompts (over‑formalized skills reduce quality), and avoid requesting full reasoning extraction (triggers reasoning_extraction refusal).

Safety classifiers : the model refuses content related to exploit code, bio‑science methods, or attempts to extract the reasoning chain.

Search behavior : detailed rules dictate when to search (e.g., “who is …”, unknown entities) and when not to (e.g., deceased persons).

Leaked system prompt vs. official blog

A 1 600‑line system prompt leaked on GitHub reveals additional constraints:

Strict copyright compliance via a CRITICAL_COPYRIGHT_COMPLIANCE block that limits source citations to 15 words, allows a single citation per source, and bans copying lyrics, poems, or reconstructing article structure.

Full product lineup embedded in the prompt (Claude Code, Cowork, Chrome, Excel, PowerPoint).

Formatting preferences that deliberately limit bold, headings, and lists, resulting in more restrained output.

Reproducing Fable 5 behavior with Opus 4.8

Download the leaked system prompt from

https://github.com/elder-plinius/CL4R1T4S/blob/main/ANTHROPIC/CLAUDE-FABLE-5.md

Place CLAUDE-FABLE-5.md in the Claude Code project folder.

Launch the model with the special command:

claude --dangerously-skip-permissions --system-prompt-file CLAUDE-FABLE-5.md

Switch the underlying model to Opus 4.8 Max.

This injection reproduces roughly 90 % of Fable 5’s performance; the remaining gap stems from Opus 4.8’s raw reasoning, vision, and context capabilities.

Local alternative: Gemma 4 12B Coder distillation model

Model name: gemma-4-12B-coder-fable5-composer2.5-v1-GGUF (available on HuggingFace). It is a dual‑source distillation model trained on:

Main dataset : Composer 2.5 real‑world chain‑of‑thought Python solutions that passed tests.

Supplementary dataset : Fable 5‑resolved problems, providing coverage for harder cases.

Training recipe: “real CoT for primary coverage + synthetic CoT for failure cases, all verified by execution.”

Quantization options (example Q4_K_M): file size 6.87 GB, runs on most GPUs. Context length scales with VRAM (e.g., 12 GB ≈ 30 k tokens, 24 GB ≈ 128 k tokens). Apple Silicon unified memory works but is slower.

Running with llama.cpp (requires recent gemma4_unified branch):

llama-server \
  -m gemma4-coding-Q4_K_M.gguf \
  --ctx-size 16384 \
  --n-gpu-layers 99 \
  --no-mmap \
  -fa on \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  --temp 1.0 --top-p 0.95 --top-k 64 \
  --host 0.0.0.0 --port 18080

The model also loads in LM Studio, Jan, Ollama, etc., by importing the GGUF file.

Important notes:

Requires a recent llama.cpp build supporting gemma4_unified.

Optimized for Python/algorithm coding; inference quality is strongest in this domain.

Safety refusals are reduced because the training data lacks safety hedging; production use should add custom guardrails.

Thinking mode is supported; enable with enable_thinking=true.

Version 2 update

Due to the revocation of Fable 5 API access, the author plans to shift the primary dataset to Composer 2.5, keep Fable 5 data as supplemental, and potentially add GLM‑5.2 as an auxiliary teacher model. BridgeMind benchmarks indicate that GLM‑5.2 can slightly surpass Fable 5 on certain reasoning benchmarks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

prompt engineering Local Deployment Opus 4.8 Gemma 4 12B Claude Fable 5 Qwen 3.7

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Model capabilities

Prompt engineering guide

Leaked system prompt vs. official blog

Reproducing Fable 5 behavior with Opus 4.8

Local alternative: Gemma 4 12B Coder distillation model

Version 2 update

Old Zhang's AI Learning

How this landed with the community

Was this worth your time?

0 Comments

Reproducing Fable 5 behavior with Opus 4.8

Local alternative: Gemma 4 12B Coder distillation model

Version 2 update