Qwen3.5-Omni Introduces Audio‑Visual Vibe Coding: Code by Speaking and Gesturing
Alibaba's newly released Qwen3.5-Omni multimodal model adds an Audio‑Visual Vibe Coding feature that lets users describe a website or game with speech and gestures to generate code, while offering advanced audio comprehension, long‑duration media support, multilingual capabilities, fine‑grained voice control, and voice cloning, though its weights remain closed‑source.
Alibaba has officially released Qwen3.5-Omni, the next‑generation multimodal model of the Qwen series. It natively supports text, image, audio, and video understanding rather than being assembled from separate modules.
The most intriguing feature is “Audio‑Visual Vibe Coding”. By turning on a camera, speaking and gesturing to describe the desired website or game, Qwen3.5‑Omni‑Plus generates usable code in real time, turning natural language and visual cues into a development workflow.
Key performance figures include:
Audio comprehension surpasses Gemini‑3.1 Pro.
Native handling of up to 10‑hour audio or 400‑second 720p video.
Trained on more than 100 million hours of multimodal data.
Supports 113 languages for recognition and 36 languages for output.
In interactive use, the model offers fine‑grained voice control—adjusting the AI’s emotion, speed, and volume—and a voice‑cloning capability that can create a personalized AI voice from a short audio clip, though large‑scale deployment is still in progress.
The series comes in three versions—Plus, Flash, and Light—targeting different performance‑cost scenarios. The model is currently available for trial through:
Online chat: chat.qwen.ai
HuggingFace offline demo
Alibaba Cloud API
Note that only the API is released; the model weights are not open‑source, indicating a shift in Alibaba’s large‑model strategy.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
