Mobile Development 11 min read

PhoneClaw and PokeClaw: Turning Your Phone into a Private AI Agent

PhoneClaw and PokeClaw are open‑source, on‑device AI agents for iOS and Android that run Gemma 4 locally, offering offline privacy, zero‑cost operation, and native tool calling through iOS APIs or Android Accessibility services.

Geek Labs
Geek Labs
Geek Labs
PhoneClaw and PokeClaw: Turning Your Phone into a Private AI Agent

PhoneClaw – On‑device AI Agent for iPhone

GitHub: https://github.com/kellyvv/PhoneClaw (530★, Swift)

PhoneClaw is the first AI agent that runs entirely on an iPhone, powered by Google Gemma 4. No network connection or API key is required, and all data stays on the device.

Typical Use Cases

Driving and messaging: "Schedule a meeting with Zhang tomorrow at 2 PM" creates a calendar event without the driver taking hands off the wheel.

Offline privacy‑sensitive tasks: Query health data, translate documents, or organize contacts while on a plane or in a basement, with no server involvement.

Photo and document understanding: Take a picture of a product or a table and ask the model to identify the API type or describe data trends; processing stays local.

Core Value

Complete offline operation: Chat logs, health data, contacts, calendar, and photos are processed locally, eliminating the privacy risk of cloud‑based assistants.

Skill system: Each capability is defined in a Markdown file; adding or modifying a skill does not require recompiling the app. Built‑in skills cover calendar, reminders, contacts, clipboard, translation, and health queries.

Flexible model management: Supports Gemma 4 E2B (lightweight, single‑turn chat/translation) and E4B (full‑featured multi‑turn tool‑calling). Models can be downloaded on the device or bundled with the app.

Technical Principle

PhoneClaw integrates Apple’s MLX inference framework, loads a 4‑bit quantized Gemma 4 model, and uses native iOS APIs (EventKit, Contacts, HealthKit, UIPasteboard). The model emits <tool_call> instructions that invoke real iOS APIs; the framework validates tool registration against the Skill description at startup.

Performance

According to the official changelog (2024‑04‑10), multi‑turn dialogue reuses KV‑Cache across turns, reducing the first‑token response time by roughly 3.5×. The E2B model occupies ~3.58 GB, the E4B model ~5.22 GB, and runs most smoothly on iPhone 15 Pro or newer.

Future Direction – "iOS Lobster"

Planned enhancements include expanding high‑frequency APIs (files, photos, notes), integrating Shortcuts/App Intents, adding OCR and speech‑recognition models, and building a local knowledge‑base retrieval system, aiming to make the iPhone a truly autonomous AI‑driven device.

PhoneClaw Banner
PhoneClaw Banner

PokeClaw – Natural‑Language Control for Android Phones

GitHub: https://github.com/agents-io/PokeClaw (398★, Kotlin)

PokeClaw (also called PocketClaw) is the first on‑device AI agent for Android, also driven by Gemma 4. It operates through Android’s AccessibilityService, reading the UI tree and issuing native tool calls such as tap, input_text, open_app, and send_message.

Typical Use Cases

Automatic message monitoring and reply: Continuously watch a WhatsApp conversation and generate context‑aware replies locally.

Voice control while driving: "Tell my wife I'm stuck in traffic, will be half an hour late" – the agent opens contacts, composes, and sends the message without looking at the screen.

Cross‑app automation: "Check tomorrow's weather and add a reminder to the calendar" – the agent reads weather data, switches apps, and writes the reminder using a single natural‑language command.

Core Value

Zero cost, zero subscription: Unlike cloud‑based assistants that require API keys and monthly fees, PokeClaw runs entirely on‑device, incurring no network usage or billing.

Native tool calling: Gemma 4 on LiteRT‑LM outputs structured tool calls, reducing latency and increasing success rates compared with plain text commands.

Skills workflow: Because a 2‑3 B on‑device model cannot reliably plan complex actions, PokeClaw introduces a Skills system that composes primitive actions (tap, swipe, type, open_app, send_message) into reusable scripts. This design is inspired by Claude Code’s Skill architecture and allows users to define new Skills via simple text files.

Technical Principle

PokeClaw captures the UI hierarchy via AccessibilityService, converts it to a textual description, feeds it to Gemma 4 E2B for inference, and executes the model‑generated tool calls back through the AccessibilityService, forming a perception‑decision‑action loop that never leaves the device.

Performance

On entry‑level Android devices (CPU‑only, no GPU/NPU) the Gemma 4 E2B model takes about 45 seconds to warm up; on flagship chips (Tensor G3/G4, Snapdragon 8 Gen 2/3, MediaTek 9200+) warm‑up drops to a few seconds. The framework supports Android 9+, with a recommendation of 12 GB+ RAM for smooth operation.

Future Direction – "Android Lobster"

The roadmap includes optional cloud LLM integration (v0.3.0 already supports OpenAI/Anthropic/Google APIs), 8‑10 built‑in Skills (search apps, swipe‑read, WhatsApp messaging, tab navigation, etc.), and a community‑driven Skill sharing ecosystem. As on‑device models grow to 7 B or 13 B parameters, the reliance on Skills will diminish, enabling fully autonomous Android agents.

PokeClaw README Screenshot
PokeClaw README Screenshot
PokeClaw Demo Frame
PokeClaw Demo Frame

Conclusion – The Trend Toward Device‑Side AI Agents

PhoneClaw and PokeClaw illustrate two parallel implementation paths: PhoneClaw leverages iOS native APIs (EventKit, HealthKit), while PokeClaw uses Android Accessibility services. Both converge on the broader trend of device‑side AI agents that can act autonomously on hardware.

When AI models can directly drive hardware, they evolve from mere chat companions to agents capable of performing real actions. Both projects label themselves as the "iOS/Android Lobster," emphasizing the goal of turning the phone into a self‑thinking, self‑acting smart terminal.

Current limitations stem from the modest 2‑4 B parameter on‑device models, keeping both agents in an "assistive" stage that still relies on Skills for complex planning. Ongoing improvements in Gemma 4, MLX quantization, and larger on‑device models are expected to rapidly expand the capabilities of these agents.

Privacy‑first, zero‑cost, and offline availability make device‑side AI agents a distinct and compelling track, with PhoneClaw and PokeClaw representing the most noteworthy implementations today.

iOSAndroidmobile AIon-device AIGemmaPhoneClawPokeClaw
Geek Labs
Written by

Geek Labs

Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.