Jul 16, 2024 · Artificial Intelligence

LLMs Misjudge Simple Number Comparison: 9.11 vs 9.9

Recent tests reveal that popular large language models—including GPT‑4o, Gemini Advanced, and Claude 3.5—often claim 9.11 is larger than 9.9 because their tokenizers split the numbers, but rephrasing, zero‑shot chain‑of‑thought prompts, or treating the values as floating‑point numbers can correct the mistake, a pattern also seen variably in Chinese models.

AI evaluationLLMPrompt Engineering

0 likes · 7 min read

LLMs Misjudge Simple Number Comparison: 9.11 vs 9.9

numeric comparison

LLMs Misjudge Simple Number Comparison: 9.11 vs 9.9