Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 3, 2026 · Artificial Intelligence

How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark

Alibaba Cloud’s AI Search team introduces Ops‑Agentic‑Search, an enterprise‑grade AI agent framework that tackles core challenges of hallucination, task failure, and long‑term consistency, leverages the GAIA benchmark to demonstrate a 92.36% accuracy—matching human experts—and outlines its technical architecture, key mechanisms, use cases, and future open‑source contributions.

Dynamic PlanningEnterprise AIGAIA benchmark
0 likes · 11 min read
How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 2, 2026 · Artificial Intelligence

How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark

The article explains the shift of AI agents from passive responders to proactive executors, outlines the challenges of hallucination, task failure, and consistency, introduces the GAIA benchmark, and details how Alibaba Cloud's Ops‑Agentic‑Search achieved a 92.36% accuracy—matching human experts—through global planning, reflection, dynamic context management, and a self‑evolving skills system.

AI AgentDynamic PlanningEnterprise AI
0 likes · 12 min read
How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark
BirdNest Tech Talk
BirdNest Tech Talk
Apr 3, 2025 · Artificial Intelligence

How Genspark’s Super Agent Outperforms OpenAI and Manus in GAIA Benchmarks

Genspark’s newly released Super Agent, built on a Mixture‑of‑Agents architecture that combines eight specialized LLMs and over 80 tools, claims to autonomously plan, execute, and integrate external services across tasks such as travel planning and video summarization, and reportedly surpasses OpenAI and Manus in the GAIA benchmark while offering instant access without an invitation code.

AI AgentAutomationGAIA benchmark
0 likes · 4 min read
How Genspark’s Super Agent Outperforms OpenAI and Manus in GAIA Benchmarks
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 15, 2025 · Industry Insights

Why Some AI Agents Are Gaming the GAIA Benchmark – A Deep Dive

The article reveals how the GAIA agent benchmark’s publicly available validation set enables participants to cheat by submitting scores derived from known answers, exposing unprofessional practices by teams like Manus and OpenAI and urging the community to rely only on hidden test data for fair evaluation.

GAIA benchmarkleaderboard integrityvalidation set
0 likes · 4 min read
Why Some AI Agents Are Gaming the GAIA Benchmark – A Deep Dive
AI Algorithm Path
AI Algorithm Path
Mar 6, 2025 · Artificial Intelligence

How Manus’s General AI Agent Could Redefine Future Workflows

Manus, billed as the world’s first true general AI agent, combines multi‑agent architecture, tool integration, and superior GAIA benchmark performance to automate complex tasks, while its invitation‑only rollout and ethical concerns illustrate the tension between hype‑driven marketing and sustainable AI adoption.

AI AgentGAIA benchmarkManus
0 likes · 6 min read
How Manus’s General AI Agent Could Redefine Future Workflows