Google Launches Gemini 3.5 Real-Time Speech Translation Model
2026-06-10 09:07
Favorite

en.Wedoany.com Reported - On June 9, Google announced the launch of the Gemini 3.5 Live Translate real-time speech translation model. Designed for real-time speech-to-speech translation scenarios, this model can automatically detect over 70 languages, generate more natural and fluent translated speech, and preserve the speaker's tone, pace, and pitch as much as possible. It will be rolled out across products and services such as Google Translate, the Gemini Live API, Google AI Studio, and Google Meet starting today.

The core capabilities of Gemini 3.5 Live Translate focus on continuous audio stream processing and low-latency speech generation. Traditional real-time translation systems often need to wait for the speaker to pause or finish a sentence before translating, leading to noticeable delays, unnatural sentence breaks, and loss of tone. The model launched by Google this time continuously processes audio during speech, dynamically balancing context acquisition and synchronization to ensure the translated speech follows the original speech with minimal delay. For scenarios such as international meetings, online classes, live streaming, customer service calls, travel communication, and multilingual collaboration, the value of this model lies in making the translation experience closer to simultaneous interpretation, rather than simply transcribing speech into text and then mechanically reading it aloud. The model can automatically identify languages in multilingual input, reducing the need for users to manually switch settings, and improves practical usability in noisy environments.

The model supports over 70 languages and can cover more than 2,000 language combinations in Google Meet. Developers can access it through the public beta of the Gemini Live API, enterprise users can experience it in the private beta of Google Meet, and regular users can gradually use it in Google Translate on Android and iOS.

For Google, Gemini 3.5 Live Translate pushes large model capabilities further into high-frequency communication gateways. Translation has always been one of Google's long-accumulated data and product scenarios, previously focusing more on text translation, photo translation, conversation translation, and offline translation. With the development of native multimodal models, speech translation is shifting from a segmented process of "recognition-translation-synthesis" to a more coherent audio end-to-end experience. If Gemini 3.5 Live Translate can operate stably in real meetings, on mobile devices, in headphones, and in developer applications, it will strengthen Google's AI gateway position in real-time communication, office collaboration, language learning, and cross-border services. For developers and enterprise customers, the real-time translation capabilities provided by the Gemini Live API can also be embedded into video conferencing, online education, customer support, live interaction, and multilingual content distribution systems, transforming speech AI from a single-point function into a foundational application capability.

Google has also added SynthID watermarks to the audio generated by this model to improve the identifiability of AI-generated audio. The subsequent effectiveness will still depend on complex accent recognition, rapid multi-speaker conversations, long-term speech stability, background noise processing, and semantic fidelity between different languages. Real-time speech translation is becoming an important direction for productizing large models, and whoever can achieve a stable experience balancing low latency, naturalness, accuracy, and product coverage will more easily control the gateway for the next generation of cross-language communication tools.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com