source: google deepmind: fluid, natural voice translation with gemini 3.5 live translate
level: technical
google has launched gemini 3.5 live translate, a new audio model that performs speech-to-speech translation in near real-time across more than 70 languages. the model automatically detects the language being spoken and generates translated speech that preserves the speaker's intonation, pacing, and pitch. unlike older systems that wait for a speaker to finish before translating, this model works continuously, staying just a few seconds behind the speaker to deliver fluid audio without awkward pauses.
the model is rolling out across several google products. developers can access it through the gemini live api and google ai studio in public preview. enterprises will get a private preview in google meet starting this month, supporting over 2000 language combinations in a single meeting. the google translate app on android and ios also integrates the model, with a new listening mode on android that lets users hear translations through the phone's earpiece like a phone call.
early testers like grab, which handles over 10 million voice calls per month between drivers and travelers, report positive feedback on translation quality and low latency. developer platforms such as agora, livekit, and pipecat are integrating the api to simplify building voice translation apps. all audio generated by the model is watermarked with synthid to help detect ai-generated content and prevent misuse.
why it matters: this model enables more natural, low-latency voice translation for real-time communication across languages, useful for meetings, travel, and live interpretation.
source: google deepmind: fluid, natural voice translation with gemini 3.5 live translate