Analyzing LLM Failure Cases: Tokenization, Next‑Token Prediction, and Chain‑of‑Thought Prompting
The article explains how tokenization mismatches and biased next‑token prediction cause LLMs to miscount letters in “Strawberry” and incorrectly compare 9.9 versus 9.11, and shows that step‑by‑step Chain‑of‑Thought prompting with reason‑first output dramatically improves accuracy.