Zoom acquires real-time translation startup Kites GmbH

Zoom noted Tuesday that it signed a deal to acquire Kites, a German startup that has developed a real-time machine translation (MT) platform. Kites’ team of 12 researchers will join Zoom’s engineering team as the company works to improve meeting productivity with multilingual translation capabilities for Zoom users.
“We are constantly looking for new ways to bring happiness to our users and improve meeting productivity, and MT solutions will be essential in improving our platform for Zoom customers around the world,” Velchamy said. Sankarlingam, president of product and engineering at Zoom. “With our missions aligned to make collaboration frictionless – regardless of language, geographic location or other barriers – we are confident that the impressive team at Kites will fit seamlessly into Zoom.”
Kites was founded in 2015 by Dr. Alex Waibel and Dr. Sebastian Stüker, faculty members of the Karlsruhe Institute of Technology. Zoom said Stüker and the rest of the team will continue to work in Karlsruhe, Germany, while Waibel will take on a Zoom researcher role to advise on Zoom’s MT research and development.
The field of text-to-speech technology has had its share of challenges. In 2017, for example, Google launched a long-awaited new pair of wireless earbuds with an exclusive real-time translation feature. The pitch was that Pixel Buds could recognize speech in one language, translate the words into another language on a user’s phone, and then read the translated phrase aloud.
However, early reviews of the product revealed that technology was in trouble recognize the words of the speakers, especially if they pronounced complicated sentences or with an accent. The problem comes down to the fact that recognizing human speech is difficult, no matter how sophisticated the artificial intelligence.
Kites technology claims to be able to translate spoken language spontaneously with minimum latency and maximum accuracy. The company claims that when it comes to conversational speech, its system has an error rate of around 5%, while the human translation error rate is around 5.5%.