Microsoft recently announced that the Microsoft Translator, its AI-powered text translation service, now supports over 100 different languages and dialects. With the addition of 12 new languages, including Georgian, Macedonian, Tibetan and Uyghur, Microsoft says its translator can now make document text and information accessible to 5.66 billion people around the world.
Translator today covers the most spoken languages in the world, including English, Chinese, Hindi, Arabic and Spanish. In recent years, advances in AI technology have allowed the company to expand its language library with low-resource and endangered languages, such as Inuktitut, a dialect of Inuktut that approximately 40,000 Inuit speak in Canada.
“One hundred languages is a good step towards achieving our ambition that everyone can communicate regardless of the language they speak. Not only are we celebrating what we’ve done in translation – reaching 100 languages - but also for speech and OCR, we want to break down language barriers,” said Xuedong HuangMicrosoft Technical Fellow and Azure AI Chief Technology Officer.
The new languages and dialects that push Translator past the 100 language mark are Bashkir, Dhivehi, Georgian, Kyrgyz, Macedonian, Mongolian (Cyrillic), Mongolian (Traditional), Tatar, Tibetan, Turkmen, Uyghur and Uzbek (Latin), which are collectively spoken natively by 84.6 million people.
With this milestone, Microsoft claims to have overcome the language barrier for 72% of the world’s population.
The frontier of machine translation technology at Microsoft is a multilingual AI model called Z-code. The model combines multiple languages from a language family, such as the Indian languages of Hindi, Marathi, and Gujarati. In this way, the individual linguistic models learn from each other, which reduces the data requirements for obtaining high-quality translations.
The reduction in data requirements also allowed the team of translators to create models for languages with limited resources or at risk due to dwindling native speaker populations. Several of the languages carrying Translator above the 100 language milestone are low resource or endangered.
Z-code is part of a larger initiative to combine AI models for text, vision, audio, and language to enable AI systems to speak, see, hear, and to understand and thus increase human capacities more effectively.