Autogenesis: A Self‑Evolving Agent OS That Drives Near‑Perfect C++ LeetCode Scores

The paper introduces the Autogenesis Protocol (AGP), a two‑layer resource‑governed framework that lets agents safely modify their own prompts, tools, memory and environment, and demonstrates its effectiveness with the AGS system achieving 93.33% GAIA validation accuracy and near‑full scores on C++ LeetCode problems.

AGPAutogenesisGAIA benchmark

0 likes · 11 min read

Autogenesis: A Self‑Evolving Agent OS That Drives Near‑Perfect C++ LeetCode Scores

Alibaba Cloud Big Data AI Platform

Apr 3, 2026 · Artificial Intelligence

How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark

Alibaba Cloud’s AI Search team introduces Ops‑Agentic‑Search, an enterprise‑grade AI agent framework that tackles core challenges of hallucination, task failure, and long‑term consistency, leverages the GAIA benchmark to demonstrate a 92.36% accuracy—matching human experts—and outlines its technical architecture, key mechanisms, use cases, and future open‑source contributions.

Dynamic PlanningGAIA benchmarkOpenSearch

0 likes · 11 min read

How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark

Alibaba Cloud Big Data AI Platform

Apr 2, 2026 · Artificial Intelligence

How Alibaba Cloud’s Ops‑Agentic‑Search Reached Human‑Level Performance on the GAIA Benchmark

The article explains the shift of AI agents from passive responders to proactive executors, outlines the challenges of hallucination, task failure, and consistency, introduces the GAIA benchmark, and details how Alibaba Cloud's Ops‑Agentic‑Search achieved a 92.36% accuracy—matching human experts—through global planning, reflection, dynamic context management, and a self‑evolving skills system.

AI AgentDynamic PlanningGAIA benchmark

0 likes · 12 min read

BirdNest Tech Talk

Apr 3, 2025 · Artificial Intelligence

How Genspark’s Super Agent Outperforms OpenAI and Manus in GAIA Benchmarks

Genspark’s newly released Super Agent, built on a Mixture‑of‑Agents architecture that combines eight specialized LLMs and over 80 tools, claims to autonomously plan, execute, and integrate external services across tasks such as travel planning and video summarization, and reportedly surpasses OpenAI and Manus in the GAIA benchmark while offering instant access without an invitation code.

AI AgentAutomationGAIA benchmark

0 likes · 4 min read

How Genspark’s Super Agent Outperforms OpenAI and Manus in GAIA Benchmarks

Baobao Algorithm Notes

Mar 15, 2025 · Industry Insights

Why Some AI Agents Are Gaming the GAIA Benchmark – A Deep Dive

The article reveals how the GAIA agent benchmark’s publicly available validation set enables participants to cheat by submitting scores derived from known answers, exposing unprofessional practices by teams like Manus and OpenAI and urging the community to rely only on hidden test data for fair evaluation.

GAIA benchmarkleaderboard integrityvalidation set

0 likes · 4 min read

Why Some AI Agents Are Gaming the GAIA Benchmark – A Deep Dive

AI Algorithm Path

Mar 6, 2025 · Artificial Intelligence

How Manus’s General AI Agent Could Redefine Future Workflows

Manus, billed as the world’s first true general AI agent, combines multi‑agent architecture, tool integration, and superior GAIA benchmark performance to automate complex tasks, while its invitation‑only rollout and ethical concerns illustrate the tension between hype‑driven marketing and sustainable AI adoption.

AI AgentGAIA benchmarkManus

0 likes · 6 min read

How Manus’s General AI Agent Could Redefine Future Workflows