Jun 10, 2026 · Artificial Intelligence

MiniAppBench Reveals Only 1 in 6 AI‑Generated Apps Meet Real User Needs

MiniAppBench, the first benchmark that evaluates large language models' ability to generate fully functional interactive HTML applications, shows an average pass rate of just 17% across 16 top models—with the strongest model, GPT‑5.2, achieving only 45%—highlighting a substantial gap between current capabilities and real‑world user requirements.

AI evaluationLLMMiniAppBench

0 likes · 16 min read

MiniAppBench Reveals Only 1 in 6 AI‑Generated Apps Meet Real User Needs