Alibaba's Qwen Launches Omni-Modal Large Model Qwen3.5-Omni, Supporting Over 10 Hours of Audio Input and 113 Language Recognition
2026-03-31 09:55
Favorite

en.Wedoany.com Report, March 30th - Alibaba's Qwen announced the official launch of the omni-modal large model Qwen3.5-Omni, marking a significant step forward in the field of multimodal artificial intelligence. The model's core highlight is its powerful cross-modal processing capability, enabling it to simultaneously handle various information forms such as text, audio, and video, providing users with a more intelligent and natural interactive experience.

The Qwen3.5-Omni series includes three Instruct versions of different sizes: Plus, Flash, and Light, catering to performance and efficiency requirements across diverse application scenarios. In terms of context processing capability, the model supports a long context of up to 256k, enabling efficient handling of large-scale information input. Notably, Qwen3.5-Omni excels in audio and video processing. The model supports audio input exceeding 10 hours and audio-video input of over 400 seconds at 720P (1 FPS), granting it significant advantages in complex tasks like speech recognition and video understanding.

Regarding language support, Qwen3.5-Omni demonstrates extensive language coverage, supporting speech recognition for 113 languages and dialects, and speech generation for 36 languages and dialects. This feature endows the model with strong adaptability for global application scenarios, making it widely applicable in areas such as multinational enterprise services, multilingual content creation, and intelligent customer service. Currently, developers can experience and integrate Qwen3.5-Omni through two methods: Offline API and Realtime API, flexibly meeting different needs for offline batch processing and real-time interaction.

Industry experts point out that the launch of Alibaba Qwen's Qwen3.5-Omni omni-modal large model not only achieves multiple technological breakthroughs in terms of parameters but also reflects the developmental trend of large models evolving from single-modality text processing to multimodal fusion. With the continuous enhancement of omni-modal capabilities, models like Qwen3.5-Omni are expected to foster new application forms across various industries such as smart hardware, education, healthcare, and entertainment, further promoting the popularization and practical implementation of artificial intelligence technology.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com