Can AI Really Control Your Computer? Inside TuriX‑CUA Open‑Source Agent
TuriX‑CUA is an open‑source Python‑based AI agent that equips artificial intelligence with visual perception and mouse‑keyboard control, enabling it to see the screen, reason with multimodal models, and act autonomously across macOS and Windows, with a multi‑model architecture, MCP support, and step‑by‑step setup instructions.
Overview
TuriX‑CUA (Computer Use Agent) is an open‑source Python project that enables a multimodal LLM to act as a virtual assistant: it watches the screen, decides what to do, and performs mouse clicks or keyboard input automatically.
See‑Think‑Act Loop
See: The agent captures a screenshot of the desktop at regular intervals (e.g., every few seconds).
Think: The screenshot is sent to a multimodal LLM (Turix API, any OpenAI‑compatible service, or a locally‑run model such as Qwen3‑VL). The model is prompted with a task‑specific question, e.g., “What should I click next to book a flight?”
Act: The model returns screen coordinates, UI element identifiers, or text. TuriX moves the mouse to the coordinates and clicks, or types into the focused input field.
Architecture
The system follows a planner‑executor design:
Planner: Decomposes a high‑level goal into an ordered list of sub‑steps.
Executor: Executes each step by controlling the mouse and keyboard. This separation reduces spurious clicks caused by model hallucinations.
The agent also implements the MCP protocol, allowing it to be mounted as a tool inside Claude for Desktop, Cursor, or other AI assistants.
Cross‑Platform Support
Originally macOS‑only, the project now provides a windows branch. Switching to that branch builds a Windows‑compatible binary, enabling the same agent on both macOS and Windows.
Installation (macOS example)
Step 1 – Prepare the environment
conda create -n turix_env python=3.12 conda activate turix_env git clone https://github.com/TurixAI/TuriX-CUA.git cd TuriX-CUA pip install -r requirements.txtStep 2 – Configure the model
Edit examples/config.json to specify the LLM endpoint. The default uses Turix’s own API (free quota on registration). To use another service or a local model, modify the build_llm function in main.py accordingly.
Step 3 – Grant system permissions (macOS)
Enable Accessibility for the terminal/IDE (System Settings → Privacy & Security → Accessibility). If Safari automation is required, also enable “Remote Automation” in Safari’s Develop menu. The first run will trigger a system dialog; click “Allow”.
Step 4 – Run the agent
Create a task definition in examples/config.json, for example:
{
"agent": {
"task": "打开Safari,搜索一下iPhone 17 Pro现在的价格,然后打开备忘录记下来"
}
}Then start the agent: python examples/main.py The mouse will move autonomously, open Safari, type the query, and record the result.
Additional Features
Supports the MCP protocol for seamless integration with other AI tools.
Can execute complex workflows such as searching YouTube, generating PowerPoint charts from Discord data, or booking flights and hotels.
Repository
Source code and documentation:
https://github.com/TurixAI/TuriX-CUAIllustrations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
