Microsoft is replacing at least some of its natural language processing systems with a more efficient AI model class.
These transformer-based architectures have been named “Z-code Mixture of Experts”. Only certain parts of these models are activated when run because different parts of the model learn different tasks, unlike traditional machine learning systems that require the whole system to perform calculations. As neural networks continue to develop, the Z-code model approach should prevent them from becoming too power-hungry and expensive to operate.
Microsoft said it has deployed these types of models for its text summarization, custom text classification, and key phrase extraction services available from Azure.
Now he turns his attention to Translator, his online machine translation service. The translator previously required 20 models to translate between ten human languages. The same work can be done with a single Z-code system, running in Microsoft’s cloud, we’re told.
In a series of tests commissioned by Microsoft, humans judged the quality of language translations between the old and new translator model. The data showed that the Z-code version was on average four percent better. It improved translations from English to French by 3.2%, from English to Turkish by 5.8%, from Japanese to English by 7.6%, from English to Arabic by 9.3% and from English to Slovenian by 15%, we are told.
Instead of training explicitly on language pairs, the Z-code Translator model learned to translate between multiple languages using transfer learning, Xuedong Huang, Microsoft Technical Researcher and Azure AI CTO , Explain.
“With Z-code, we’re really making incredible progress as we leverage both transfer learning and multitasking learning from monolingual and multilingual data to create a state-of-the-art language model that we believe , offers the best combination of quality, performance and efficiency that we can bring to our customers.”
Microsoft engineers used GPUs to train the Z-code Translator model. “For the deployment of our production model, we opted to train a set of five billion parameter models, which are 80 times larger than our currently deployed models,” according to to Hany Awadalla of Redmond, Senior Research Manager, Krishna Mohan, Senior Product Manager, and Vishal Chowdhary, Partner Development Manager.
The new model is now in production and is powered by Nvidia’s GPUs and its Triton Inference Server software, apparently achieving up to 27x speedup compared to unoptimized GPU configurations. It is currently only available to certain customers who need to be approved by Microsoft first. They will be able to use machine translation to translate text in Microsoft Word, Powerpoint and PDF documents. ®