Machine Learning Algorithms & Natural Language Processing
Mar 13, 2026 · Artificial Intelligence
Can Multimodal LLMs Beat Humans in Real Web Search? GPT‑5.2 Scores Only 36% on New BrowseComp‑V3 Benchmark
A new multimodal browsing benchmark, BrowseComp‑V3, reveals that human experts achieve a 68.03% success rate while the strongest closed‑source model, GPT‑5.2, manages just 36.17%, highlighting current limitations in deep web‑scale visual‑text reasoning and the critical role of tool‑augmented agents.
GPT-5.2OmniSeekerhuman performance
0 likes · 12 min read
