Can Qianwen’s Desktop Voice Input Finally Make the Keyboard Obsolete?
The article evaluates Qianwen’s new desktop voice‑input system, showing how it filters filler words, understands screen context, executes AI commands, and generates structured text, PPTs, and Excel reports, positioning voice as a viable replacement for traditional keyboard typing.
Artificial intelligence is reshaping how people interact with computers, and the next transformation may target the hundreds of daily "typing" actions that dominate office work.
Voice input is not new—Siri, Google Assistant, Whisper, Otter.ai and similar tools have proved the demand—but users still complain about filler words, disfluencies and the need to edit the transcribed text, especially when the tools run on phones while most heavy work stays on the desktop.
Qianwen’s desktop client recently launched a full‑featured voice‑input method. By holding the right Alt key (right Command on macOS) the system automatically detects the active application and on‑screen content, filters out interjections and pauses, and outputs structured, ready‑to‑send text.
The interaction model is deliberately simple: a long press of the right Alt key starts voice‑to‑text conversion; a double‑tap of the same key switches to AI voice commands. No extra plugins or window switching are required.
In a DingTalk chat window, the author spoke a natural, filler‑laden sentence: "就是啊,这个项目的话,我觉得吧,嗯,时间线要排一下,然后那个…… 对,就是周四之前要跟客户确认方案,然后内容那边也要催一催,不然可能来不及。" Qianwen stripped all filler words and reorganised the three tasks into clear, concise sentences that could be sent directly.
The system also handles complex prompts. When the author dictated a long marketing brief, the AI parsed the request, automatically generated a structured outline, and produced a ready‑to‑use document.
Recognition accuracy is high. At a normal speaking speed, Chinese speech contains virtually no typos, and mixed Chinese‑English technical terms such as "ConversationBufferMemory" and "Context Window" are transcribed without error.
Double‑tapping the right Alt key activates AI voice commands. For example, saying "帮我写一封邮件,告诉客户方案延期两天,周五能交付,态度诚恳一些。" makes Qianwen generate a complete, politely‑toned email with proper greeting and signature.
The AI adapts its tone to the current context. The same "帮我回复下,说我可以" command in a chat window yields a casual reply with an emoji, while in an email window it produces a formal business response.
Voice notes are also supported. The author dictated a lengthy idea about AI‑generated academic papers, and Qianwen automatically recorded the note and provided a link for later review.
Integration with Qianwen’s PPT builder demonstrates a workflow boost: after selecting a requirement in DingTalk and double‑tapping the shortcut, the user says "帮我把这段需求整理成待办清单,再做成汇报 PPT。" The AI extracts key points, creates a to‑do list, and generates a complete slide deck without manual copy‑paste or formatting.
The system can also process dozens of files at once. By dragging multiple Word or PDF documents into Qianwen, issuing a voice command, the AI reads the content, extracts key information, and produces visual charts, supporting 39 file formats in a single batch operation.
In an Excel‑centric scenario, the author asked the AI to "把财政部 税务总局公告 2026 年第 10 号文件包含的增值税优惠政策具体项目整理成一份 excel 清单,减免方式、政策内容、执行期限。" Qianwen fetched the relevant policies, compiled them into an Excel sheet, and presented the result through a conversational interface.
All these capabilities rest on Qianwen’s long‑standing speech foundation: billions of hours of audio‑video data, an end‑to‑end real‑time ASR model, and the latest Qwen LLM for deep understanding and text reconstruction. The AI simultaneously listens to speech, reads the screen, identifies the active software, and decides the appropriate output.
Industry trends echo this direction. Apple is upgrading Siri to an AI assistant, OpenAI continues to enhance ChatGPT’s voice dialogue, and Google’s Gemini is strengthening multimodal interaction. Voice is moving from a keyboard supplement to the primary interaction entry point in the AI era, and Qianwen’s desktop voice input marks one of the first attempts to fuse voice with on‑screen AI capabilities.
Nevertheless, desktop voice input is still in its infancy. Traditional desktop solutions have remained simple transcription tools, whereas Qianwen’s approach combines speech recognition, contextual understanding, and task execution, illustrating both the potential and the current limits of voice‑first productivity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
