Weekly Tech Digest (May 11‑17): Ilya Sutskever’s Court Testimony, Anthropic’s 4‑Day AI Sprint, and AI Agent Market Shifts
This week’s tech roundup covers Ilya Sutskever’s explosive courtroom testimony exposing OpenAI’s internal power struggle, Anthropic’s claim that AI can finish ten weeks of work in four days, the launch of OpenAI Codex on the ChatGPT mobile app, a deep dive into how AI agents are reshaping software business models, Baidu’s DuMate super‑assistant, the MiniCPM‑V 1.3B multimodal model that runs on a single RTX 4090, and MiniMax’s new Mavis multi‑agent framework for reliable long‑running tasks.
OpenAI Boardroom Drama
Ilya Sutskever, former chief scientist of OpenAI, testified in the Elon Musk‑OpenAI lawsuit, presenting a 52‑page memo that documents Sam Altman’s alleged lies, the systematic weakening of senior leadership, and internal conflicts that led to a failed attempt to remove Altman. He disclosed his own $70 billion stake and Altman’s $35 billion stake, as well as a brief meeting between the remaining board and Anthropic about a possible merger. Microsoft’s Satya Nadella also testified that Microsoft had prepared a 14‑person takeover list, effectively giving it a veto over board decisions.
Anthropic’s “10‑Week Work in 4‑Day” Claim
At the “Code w/ Claude” developer conference, Anthropic announced that AI model capabilities are growing exponentially while enterprise development remains linear. The company highlighted three pillars: a stronger base model, the Claude Platform’s new agent‑orchestration features, and the Claude Code desktop client that lets developers offload coding, PR fixing, CI error handling, and result verification to autonomous agents. Demonstrations showed an AI‑controlled drone landing on the moon and an automated grading agent that evaluates results and self‑optimises via “model dreaming”.
OpenAI Codex on Mobile
OpenAI integrated Codex into the ChatGPT mobile app for both Android and iOS, making the AI coding assistant available to all users, including the free tier. The mobile version lets developers remotely connect to a host machine running Codex, view terminal output, screenshots, test results, and approve commands, while keeping all sensitive data on the host machine behind a secure relay.
Industry Insight: AI Agents Redefining Software
Analysts describe a “SaaSpocalypse” where traditional software companies risk being bypassed by AI agents that can invoke services directly. Companies that embed themselves as agent‑orchestration platforms—Microsoft, Salesforce, ServiceNow—are positioned to lock in data and workflows. Skills (task‑specific instructions) and Plugins (packaged, versioned capabilities) become the primary way software exposes functionality to agents. Successful Skills must encode domain‑specific execution knowledge, include quality standards, and be auditable; generic writing or meeting‑note Skills are being absorbed by base model abilities.
Google Gemini Omni Leak
A leaked Gemini Omni demo generated a 10‑second video of a professor writing a triangle identity on a blackboard with perfect synchronization of speech and visual content, outperforming Google’s Veo 3.1 in realism despite minor logical flaws. The model also demonstrated video‑editing abilities such as replacing pasta with soup and removing watermarks, though generating two short videos consumed about 86 % of an AI Pro subscription’s daily quota.
Baidu’s DuMate Super‑Assistant
At Baidu Create 2026, CEO Li Yanhong introduced the Daily Active Agents (DAA) metric, arguing that the number of active AI agents better reflects AI’s value than DAU or token consumption. DuMate, Baidu’s general‑purpose agent, can orchestrate multiple parallel tasks—content creation, visual design, and competitor analysis—in ten minutes, achieving a 93.3 % score on the PinchBench benchmark and outperforming Anthropic and OpenAI in the same test.
MiniCPM‑V 4.6 Open‑Source Breakthrough
The 1.3 B‑parameter MiniCPM‑V 4.6 model, co‑developed by Mianbot, Tsinghua University, and OpenBMB, surpasses Qwen 3.5‑0.8B and Gemma 4‑E2B‑it on the Artificial Analysis leaderboard while consuming only 5.4 M tokens—about 1/19 of Qwen’s non‑inference version. Two architectural innovations—LLaVA‑UHD v4 visual token pruning and a 4×/16× hybrid visual token compression—reduce visual FLOPs by over 50 % and double throughput on an RTX 4090. The model integrates with LLaMA‑Factory, ms‑swift, vLLM, SGLang, llama.cpp, and Ollama, and ships source code for iOS, Android, and HarmonyOS.
Eight‑Year‑Old Builds an OS via AI
An eight‑year‑old demonstrated “Chat‑generated” OS creation: by describing an OS on paper and speaking a few sentences, the AI produced a native app that could be interacted with, illustrating Baidu’s Sec‑da 3.0 capability to turn natural‑language prompts into fully functional software.
MiniMax Mavis Multi‑Agent Framework
MiniMax released Mavis, a “Jarvis‑style” agent system that introduces a Team Engine with three explicit roles: Leader, Worker, and Verifier. In complex tasks split among multiple Workers, the Verifier rigorously checks outputs, forcing Workers to correct specific errors. Tests showed Mavis handling up to nine concurrent subtasks, generating diverse deliverables (xls, ppt, html) while maintaining context isolation. The system incurs higher “consensus cost” (information transfer, summarisation, aggregation) and is therefore suited for high‑value, complex workflows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ZhongAn Tech Team
China's first online insurer. Through tech innovation we make insurance simpler, warmer, and more valuable. Powered by technology, we support 50 billion RMB of policies and serve 600 million users with smart, personalized solutions. ZhongAn's hardcore tech and article shares are here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
