How AI Can Control Your Desktop: Inside the Open‑Source TuriX‑CUA Agent
TuriX‑CUA is an open‑source AI desktop agent that captures screen content, uses multimodal large models to decide actions, and automatically moves the mouse or types, offering cross‑platform support, multi‑model architecture, and detailed setup instructions for Windows and macOS.
TuriX‑CUA (Computer Use Agent) is an open‑source Python‑based desktop automation agent that can control mouse clicks, keyboard input, and complex cross‑application workflows on both macOS and Windows. It follows a three‑step See‑Think‑Act loop:
See : captures the screen at regular intervals to obtain the current UI state.
Think : sends the screenshot to a multimodal large model, which decides the next action (e.g., click a button, type text).
Act : receives coordinates or keystroke instructions from the model and programmatically moves the mouse, clicks, or types.
The agent uses a “Planner + Executor” architecture. The planner (decision maker) decomposes a high‑level task into concrete steps, while the executor focuses on precise UI interactions, reducing erroneous clicks and improving overall task quality.
Cross‑Platform Support
Originally macOS‑only, Windows support was added in a later release. Users select the appropriate branch for their OS. Example capabilities include:
macOS: automate Safari searches, generate Pages documents, extract data from Discord, create charts, insert them into PowerPoint, and handle travel‑booking workflows.
Windows: automate YouTube searches and likes, and integrate with the MCP protocol to allow voice‑driven tools such as Claude for Desktop or Cursor to trigger full browser, Word, and WeChat automation.
Installation and Setup
1. Environment Preparation
conda create -n turix_env python=3.12 # create isolated environment
conda activate turix_env
git clone https://github.com/TurixAI/TuriX-CUA.git
cd TuriX-CUA
pip install -r requirements.txt2. Model Configuration
Edit examples/config.json to choose a model. The default Turix API provides a free quota. To use a custom endpoint (e.g., a locally‑deployed Qwen3‑VL or an OpenAI‑compatible service), modify the build_llm function in main.py. Qwen3‑VL has been reported to perform well on UI element recognition.
3. System Permissions
Enable accessibility for the terminal and your IDE (e.g., PyCharm, VS Code) via System Settings → Privacy & Security → Accessibility.
For Safari automation, enable “Allow Remote Automation” in Safari’s Develop menu.
When the agent first runs, approve the system prompt that grants control of the computer; otherwise mouse movement will fail.
4. Running the Agent
Define a task in examples/config.json. Example for macOS:
{
"agent": {
"task": "打开Safari,搜索iPhone 17 Pro当前价格,打开备忘录记录结果"
}
}Start the agent: python examples/main.py The agent will automatically open Safari, enter the search query, retrieve the result, and record it in the Notes app without any manual intervention.
Community and Repository
The project is free and open source for personal and research use. Community support is available via Discord and email.
Project repository: https://github.com/TurixAI/TuriX-CUA
Old Meng AI Explorer
Tracking global AI developments 24/7, focusing on large model iterations, commercial applications, and tech ethics. We break down hardcore technology into plain language, providing fresh news, in-depth analysis, and practical insights for professionals and enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
