Java Tech Enthusiast
Jul 16, 2024 · Artificial Intelligence
LLMs Misjudge Simple Number Comparison: 9.11 vs 9.9
Recent tests reveal that popular large language models—including GPT‑4o, Gemini Advanced, and Claude 3.5—often claim 9.11 is larger than 9.9 because their tokenizers split the numbers, but rephrasing, zero‑shot chain‑of‑thought prompts, or treating the values as floating‑point numbers can correct the mistake, a pattern also seen variably in Chinese models.
AI evaluationLLMPrompt Engineering
0 likes · 7 min read