How TuriX-CUA Lets AI Control Your Windows and macOS Desktop
TuriX-CUA is an open‑source Python framework that lets AI agents understand screen content and perform mouse‑keyboard actions on Windows and macOS, offering high success rates, model hot‑swapping, MCP integration, and step‑by‑step installation for desktop automation tasks.
Overview
TuriX-CUA is an open‑source AI desktop‑control framework written in Python. It uses a vision‑language model to interpret screen contents and then simulates mouse clicks and keyboard input, enabling an AI agent to operate Windows or macOS applications without requiring application‑specific APIs. The default model (Qwen3‑VL) achieves >68% success on the OSWorld benchmark, a ~15% improvement over earlier open‑source agents such as UI‑TARS.
Installation
Create a Python 3.12 Conda environment, clone the repository, and install dependencies:
conda create -n turix_env python=3.12
conda activate turix_env
git clone https://github.com/TurixAI/TuriX-CUA
cd TuriX-CUA
pip install -r requirements.txtFor macOS, grant accessibility and Safari automation permissions (see Platform‑specific setup ). Windows users should switch to the windows branch.
Platform‑specific setup
macOS accessibility: System Settings → Privacy & Security → Accessibility → add Terminal and your IDE (e.g., VS Code). Also add /usr/bin/python3 if needed.
Safari automation: Safari → Settings → Advanced → enable “Show Develop menu”. In the Develop menu enable “Allow Remote Automation” and “Allow JavaScript from Apple Events”. Trigger the permission dialog, e.g. with:
osascript -e 'tell application "Safari" to do JavaScript "alert(\"Triggering accessibility request\")" in document 1'Configuration
Edit examples/config.json to define the task for the agent. Example (Chinese task to switch macOS to dark mode):
{
"agent": {
"task": "打开系统设置,切换到深色模式"
}
}Configure the LLM provider. For the Turix API:
{
"llm": {
"provider": "turix",
"api_key": "YOUR_API_KEY",
"base_url": "https://llm.turixapi.io/v1"
}
}To use a local Ollama model, replace the provider block with:
{
"llm": {
"provider": "ollama",
"model_name": "llama3.2-vision",
"base_url": "http://localhost:11434"
}
}Running the agent
After configuration, start the agent with: python examples/main.py The AI agent will read the task description, invoke the vision‑language model, and perform the specified desktop actions automatically.
Demo scenarios
Book flights, hotels, and an Uber ride.
Query iPhone prices, create a Pages document, and email it.
Generate a bar chart from a Numbers file, insert it into PowerPoint, and reply to a boss.
Search YouTube on Windows and like a video.
Claude‑driven MCP demo: Claude searches AI news, invokes TuriX via MCP, writes results into a Pages document, and sends it.
Project structure
The repository is actively maintained and organized into modules such as agent, ai‑agents, browser‑use, computer‑automation, gui‑agent, mcp, and qwen3‑vl. It is suitable for office automation, software testing, data collection, content browsing, and UI testing.
Repository: https://github.com/TurixAI/TuriX-CUA
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
