Artificial Intelligence 17 min read

Mano‑P 1.0: The First GUI Agent to Top 13 Benchmarks and Move from Claw to Hand

Mano‑P 1.0 is a pure‑vision GUI agent that runs locally on Apple M4 devices, achieves SOTA on 13 multimodal benchmarks, offers zero‑cloud data handling, and introduces a three‑stage open‑source roadmap that reshapes personalized AI and end‑to‑end GUI automation.

Machine Heart

Apr 13, 2026

Mano‑P 1.0: The First GUI Agent to Top 13 Benchmarks and Move from Claw to Hand

Since the rise of "lobster" AI agents, users have accepted that agents can control a computer, but early models like Claw were clumsy and required many Skills to perform simple tasks. Mano‑P 1.0, released by Mininglamp‑AI, represents a leap from "claw" to "hand" by providing a pure‑vision GUI operation model that runs entirely on the device without any API integration.

Mano‑P 1.0 is a 72‑billion‑parameter visual GUI model that supports three deployment forms for all developer groups. It runs locally on an M4 Mac (or Mac mini) with no cloud communication, guaranteeing that all screenshots and task data never leave the device.

Benchmark performance : The model achieves state‑of‑the‑art results on 13 multimodal leaderboards, including ScreenSpot‑V2 (93.5), MMBench (87.5), UI‑Vision (4/6), and OSWorld (58.2% success rate), ranking first among dedicated GUI agents and 13.2 percentage points ahead of the runner‑up opencua‑72b (45.0%). On the overall model list it places fifth behind only trillion‑parameter general models such as Claude Sonnet 4.6 (72.1%) and Gemini 2.5 Pro (66.9%).

Technical advantages :

Extreme edge performance: the 4B w4a16 quantized version runs at 476 tokens/s pre‑fill and 76 tokens/s decode on an M4 Pro, using only 4.3 GB of RAM.

Full‑scene visual understanding: a pure‑vision pipeline breaks the browser‑only limitation and works across desktop software.

Offline planning and self‑correction: the agent plans actions locally and validates results after each step, enabling secure operation without network access.

Hardware‑software integration: plug‑and‑play deployment eliminates complex environment setup.

The model’s training pipeline consists of three stages: (1) supervised fine‑tuning (SFT) to build basic GUI comprehension, (2) offline reinforcement learning on historical data for policy optimization, and (3) online reinforcement learning with real‑time interaction for continual self‑evolution. A bidirectional Text↔Action consistency loop (Text→Action and Action→Text) reinforces robustness, while the GSPruning visual token pruning technique reduces token retention to 12.57 % and boosts throughput by 2–3×.

Mano‑P’s open‑source strategy unfolds in three phases: first the Mano‑CUA Skill (CLI tool), then the full model (including the 72B and 4B versions), and finally the training methodology (including pruning and quantization). The GitHub repository https://github.com/Mininglamp-AI/Mano-P/tree/main provides all code under Apache 2.0.

Three access modes are offered: mano‑cua (CLI for developers), mano‑skill (Agent Skill plugin for Claude Code/OpenClaw), and the upcoming mano‑client Python SDK for deep integration. All share the same core capabilities.

In an interview, the team explained that the "P" in Mano‑P can stand for Power, Private, Personal, or Party, reflecting a shift from generic AGI toward "Personalized AI"—AI that leverages private assets of individuals or organizations to produce the most valuable solutions for them.

Looking ahead, the authors envision fully automated GUI testing (Mano‑afk), where a natural‑language request triggers end‑to‑end workflow: requirement clarification, architecture design, code generation, local deployment, API testing, visual verification, and iterative bug fixing—all without human intervention.

Overall, Mano‑P 1.0 demonstrates that a locally‑run, pure‑vision GUI agent can achieve world‑leading performance, preserve user privacy, and open a path toward personalized AI on edge devices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

open-source benchmark GUI Agent local inference Personalized AI vision-language model Mano-P

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.