Google has introduced a new free tool named Aloud to quickly dub videos in multiple languages. Developers at Google’s internal incubation hub, Area 120, designed Aloud to produce dubbed versions of videos in new languages in less than an hour, with a chance for the creator to fix any mistakes.
Aloud combines several popular AI tasks into one tool for creators on YouTube and other video platforms. The user can provide captions to a video or the AI text-to-speech model can produce a transcript for review. Then the tool translates the text into an available language and the user chooses a synthetic voice to read the translated speech, replacing the original audio in the video for posting. Aloud can translate and dub a five-minute video in 10 minutes, according to its designers. For transparency, the creators of Aloud must mention that the dubbing is synthetic and refer to the original video in the description, the credits or as a pinned comment. The current early access version of Aloud only dubs English videos into Spanish and Portuguese, but the Sri Lankan-born developers have Hindi and Bahasa-Indonesian in the pipeline along with several other languages.
“Before, dubbing required weeks of effort and a big budget. But with Aloud, you only need a few minutes,” explained Aloud co-founders Buddhika Kottahachchi and Sasakthi Abeysinghe when announcing the product. “We use advances in audio separation, machine translation and text-to-speech to reduce time-consuming and costly steps such as translation, video editing and audio production. You don’t even need to know a language other than the ones you already speak, and it’s all available at no cost to the creator.
Google has been studying AI translation and transcription for a long time. For example, Google researchers last year produced an AI translation model called Translatotron 2 capable of translating and synthesizing human speech. This is an unrelated project as the creators of Aloud are not authors on paper and Translatotron 2 was specifically designed to produce only translated audio in the original speaker’s voice to to avoid deepfakes. There is also the real-time transcription option for Google Translate and the instant translation feature for Google Assistant on Android. Aloud focuses on video, not just audio. The creators cite informative and educational content as their primary goal.
“With dubbing, you can now reach previously unreachable segments of the world’s population. In our experiments, we saw double-digit view growth just by doubling down in an additional language,” Kottahachchi and Abeysinghe wrote. “Aloud does not create new content – it only takes the original speech and translates it into another language of your choice. We are also working with YouTube to allow creators to add multiple audio tracks to their videos, a new feature that they started testing with a small group of creators late last year.
Google’s Translatotron 2 Improves Language Changes Without the Potential of Deepfake
Google AI will describe images in 10 more languages with a unified translation model
New Google Assistant feature reads and translates websites aloud on Android devices