Microsoft leverages AI techniques to bring Translator to 100 languages

Interested in knowing what’s next for the gaming industry? Join gaming executives to discuss emerging parts of the industry in October at GamesBeat Summit Next. Learn more.


Today, Microsoft announced that Microsoft Translator, its AI-powered text translation service, now supports over 100 different languages ​​and dialects. With the addition of 12 new languages, including Georgian, Macedonian, Tibetan and Uyghur, Microsoft says Translator can now make document text and information accessible to 5.66 billion people around the world.

Its translator isn’t the first to support more than 100 languages ​​— Google Translate first hit that milestone in February 2016. (Amazon Translate only supports 71.) But Microsoft says new languages ​​are supported by unique AI advancements and will be available. in Translator, Office, and Translator for Bing apps, as well as Azure Cognitive Services Translator and Azure Cognitive Services Speech.

“One hundred languages ​​is a good step for us to achieve our ambition that everyone can communicate regardless of the language they speak,” said Xuedong Huang, chief technology officer of Microsoft Azure AI, in a statement. “We can take advantage [commonalities between languages] and use it… to improve the whole family of the language[ies].”

Z-Code

As of today, Translator supports the following new languages, which Microsoft says are spoken natively by 84.6 million people collectively:

  • Bashkir
  • Dhivehi
  • Georgian
  • Kyrgyz
  • Macedonian
  • Mongolian (Cyrillic)
  • Mongolian (traditional)
  • Tatar
  • Tibetan
  • Turkmen
  • Uighur
  • Uzbek (Latin)

Powering Translator upgrades are Z Code, part of Microsoft’s larger XYZ Code initiative to combine AI models for text, vision, audio, and language to build systems of AI that can speak, see, hear and understand. The team includes a group of scientists and engineers who are part of Azure AI and the Project Turing research group, focusing on building large-scale multilingual language models that support diverse production teams.

Z-code provides the framework, architecture, and models for AI text-based multilingual translation for entire language families. Due to the sharing of linguistic elements between similar languages ​​and transfer learning, which applies knowledge from one task to another related task, Microsoft claims to have been able to significantly improve the quality and reduce the cost of its machine translation capabilities.

With Z-code, Microsoft is using transfer learning to go beyond the most common languages ​​and improve translation accuracy for “low-resource” languages, which refers to languages ​​with less than a million training data sentences. (Like all models, Microsoft learns from examples in large datasets from a mix of public and private archives.) About 1,500 known languages ​​meet these criteria, so Microsoft has developed a multilingual translation training process that combines language families and language models.

Techniques such as neural machine translation, rewriting-based paradigms, and on-device processing have led to quantifiable advances in machine translation accuracy. But until recently, even state-of-the-art algorithms lagged behind human performance. Efforts beyond Microsoft illustrate the scale of the problem – the Masakhane project, which aims to make thousands of languages ​​on the African continent automatically translatable, has yet to move beyond the data collection and transcription phase. Additionally, Common Voice, Mozilla’s effort to create an open-source collection of transcribed voice data, has only verified dozens of languages ​​since its launch in 2017.

Z-code language models are trained multilingually in many languages, and this knowledge is transferred between languages. Another training cycle transfers knowledge between translation tasks. For example, the models’ translation skills (“machine translation”) are used to help improve their ability to understand natural language (“natural language understanding”).

In August, Microsoft said a Z-code model with 10 billion parameters could achieve cutting-edge results on machine translation and multilingual summarization tasks. In machine learning, parameters are internal configuration variables that a model uses to make predictions, and their values ​​essentially – but not always – define the model’s skill at a problem.

Microsoft is also working to train a 200 billion parameter version of the reference model mentioned above. For reference, OpenAI’s GPT-3, one of the largest language models in the world, has 175 billion parameters.

Market dynamics

Google’s main rival is also using emerging artificial intelligence techniques to improve translation quality across its service. Not to be outdone, Facebook recently revealed a model that uses a combination of word-for-word translations and back-translations to outperform systems for over 100 language pairs. And in academia, MIT CSAIL researchers have presented an unsupervised model — that is, a model that learns from test data that has not been explicitly labeled or categorized — that can translate between texts in two languages ​​without direct translation data between the two.

Of course, no machine translation system is perfect. Some researchers argue that AI-translated text is less “lexically” rich than human translations, and there is ample evidence that language models amplify biases present in the datasets on which they are trained. Artificial intelligence researchers from MIT, Intel and Canada’s CIFAR initiative have found high levels of bias in language models including BERT, XLNet, OpenAI’s GPT-2 and RoBERTa. Beyond that, Google has identified (and claims to have corrected) gender biases in the translation models that underpin Google Translate, especially when it comes to resource-poor languages ​​like Turkish, Finnish, Persian and Hungarian.

Microsoft, for its part, points to Translator’s traction as proof of the platform’s sophistication. In a blog post, the company notes that thousands of organizations around the world use Translator for their translation needs, including Volkswagen.

“Volkswagen Group uses machine translation technology to serve customers in more than 60 languages, translating more than a billion words each year,” writes Microsoft’s John Roach. “The reduced data requirements… allow the team of translators to create models for languages ​​with limited resources or that are at risk due to dwindling native speaker populations.”

VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Learn more about membership.