Tagged articles
2 articles
Page 1 of 1
SuanNi
SuanNi
Jun 11, 2026 · Artificial Intelligence

Why the Human Turing Test Is No Longer Enough: Agents’ Last Exam Benchmark

The article introduces Agents’ Last Exam (ALE), a comprehensive benchmark created by Berkeley and over 250 experts to evaluate generalist computer‑use agents on real‑world, multi‑step workflows across 55 sub‑fields, revealing that even the strongest models achieve only single‑digit pass rates.

AI agentsClaudeGPT-5.5
0 likes · 13 min read
Why the Human Turing Test Is No Longer Enough: Agents’ Last Exam Benchmark
HyperAI Super Neural
HyperAI Super Neural
Nov 28, 2025 · Artificial Intelligence

Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3

This weekly roundup highlights five recent AI papers—including HumanSense for multimodal LLM evaluation, JAM‑2 for de novo antibody design, the open‑source Olmo 3 language models, the Lumine generalist 3D agent, and the lightweight HunyuanOCR vision‑language model—summarizing their core contributions, results, and links.

OCRgeneralist agentsmultimodal LLM
0 likes · 6 min read
Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3