Alibaba's Qwen Launches AI Voice Input on PC, Cross-Application Smart Assistant Fully Open

2026-05-07 15:05

Favorite

en.Wedoany.com Reported - Alibaba's large model product "Qwen" officially launched its AI voice input feature on the PC platform on May 7, 2026, and it is now freely available to all users. This feature deeply integrates natural language processing with large model capabilities, allowing direct use across various desktop applications via shortcut keys, merging voice recognition and real-time intelligent processing into a system-level smart work portal.

This intelligent voice assistant, embedded within the Qwen PC client, has streamlined its operation path to the extreme: in any desktop application such as WeChat, DingTalk, Word, or a browser, users simply press a preset shortcut key to summon a floating window and start speaking. The voice is then converted into structured text in real-time and directly input into the current working page. Its built-in real-time semantic understanding engine simultaneously processes natural speech—filler words like "um" and "uh" in the user's expression are filtered out in real-time, slips of the tongue are automatically corrected, and the generated text is automatically formatted. When a user dictates meeting notice requirements in a chat box, the assistant can directly output a neatly formatted message ready to be sent, eliminating the need for secondary editing.

Cross-application scenario adaptation is the core feature that distinguishes this voice assistant from traditional speech-to-text tools. In document editing scenarios, users can issue commands by voice at any time, such as directly dictating "Help me insert the 2025 national GDP data," and the AI assistant will directly search for and insert the corresponding content. In graphic and text creation scenarios, when faced with lengthy English materials, users only need to select the relevant paragraph and say "Explain this" or "Translate into Chinese," and the assistant will automatically execute the operation. In instant messaging scenarios, when a user receives an English email or message, they only need to dictate the key points of their reply in Chinese, and the assistant will automatically generate and fill in a correctly formatted English email based on the context, eliminating the need for users to repeatedly switch between multiple applications and copy-paste.

Large model capabilities form the technical foundation of this voice assistant. Traditional voice input tools only complete the single conversion of acoustic signals to text, whereas Qwen's AI voice input on PC overlays three processing layers: semantic understanding, logical reasoning, and content generation. When a user gives a vague command, the large model can infer the user's true intent based on context; when a user dictates an incomplete paragraph, the large model can automatically complete it while maintaining a consistent writing style; when a user needs data support, the large model can synchronously retrieve authoritative information and embed it into the text. These four processing steps are completed in a closed loop within a single workflow, providing users with a What You See Is What You Get experience.

Currently, PC voice input tools mainly focus on speech-to-text capabilities at the input method level, lacking support for semantic understanding and content generation; mobile AI voice assistants are limited by computing power and interaction interfaces, making it difficult to handle complex workflow tasks. The computing power advantage of the PC makes more complex natural language understanding and real-time task processing possible. Alibaba's Qwen has chosen voice interaction as the core entry point for the PC, directly embedding the large model's logical reasoning and creative generation capabilities into every step of the user's workflow.

Qwen has recently been accelerating its deployment on both PC and mobile platforms simultaneously. As early as April 2025, the Qwen PC client launched an AI word selection feature, supporting users in summoning the AI assistant with one click after selecting text to perform operations such as search, translation, explanation, or continuation. In March of the same year, Alibaba announced a three-year investment of 380 billion yuan to build cloud and AI hardware infrastructure. The launch of this AI voice input feature marks a crucial step for Qwen in extending from text interaction to voice interaction, and from passive response to proactive understanding.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com