As the pandemic finally ends, international travel is resuming, with millions seeking to make up for lost time. When travelers explore foreign lands, tools like Google’s neural machine translation system can come in handy; Released in 2016, the software uses deep learning to make connections between words, determine how closely related they are, how likely they are to appear together in a sentence, and in what order.
Google’s tool works well: when the software was compared to human translators, it came close to corresponding to the fluidity of humans for certain languages, but it is limited to the most widely spoken languages in the world.
Meta wants to help and is investing resources in its own translation tool, with the aim (among other things) of making it much more expansive than Google’s. A paper the company published this week indicates that the Meta tool works in more than 40,000 different translation directions between 200 different languages. A “direction of translation” refers to translations between language pairs, for example:
Meaning 1: English > Spanish
Meaning 2: Spanish > English
Meaning 3: Spanish > Swahili
Meaning 4: Swahili > English
40,000 seems like a lot, but if you take all the permutations of 200 languages that translate to each other, they add up pretty quickly. It is difficult to determine precisely how many languages there are in the world, but one reliable estimate put the total at over 6,900. While it’s inaccurate to say that Meta is building a universal translation system, it’s one of the most extensive works ever done in the field, especially with what the company calls low-resource languages.
These are defined as languages with less than one million publicly available translated sentence pairs. They are largely made up of African and Indian languages which are not spoken by a large population and do not have as much written history as common languages.
“A really interesting phenomenon is that people who speak low-resource languages often have a lower bar for translation quality because they don’t have another tool,” said Meta AI researcher Angela Fan. , who worked on the project. Told The edge. “We have this inclusion motivation of ‘what would it take to produce translation technology that works for everyone’?”
Meta began its research by interviewing native speakers of low-resource languages to contextualize their need for translation, although the team notes that the majority of those interviewed were “immigrants living in the United States and Europe, and approximately one third of them identify as tech workers,” meaning there may be a built-in bias and different core life experience than the larger group of people who speak their languages.
The team then created models aimed at bridging the gap between low and high resource languages. To assess the model’s performance once it started spitting out translations, the team gathered a test dataset of 3,001 sentence pairs for each language covered by the model. The sentences were translated from English into the target languages by native speakers of those languages who are also professional translators.
The researchers fed the sentences through their translation tool and compared its results to human translations using a methodology called Bilingual Evaluation Understudy, or BLEU for short. BLUE is the standard benchmark used to rate machine translations, providing a numerical scoring system that measures the accuracy of sentence pairs. Meta researchers said their model recorded a 44% improvement in BLEU scores compared to existing machine translation tools.
However, this figure should be taken with a grain of salt. Language can be very subjective and a sentence can take on a completely different meaning based on a single word difference; or keep exactly the same meaning despite changing several words. The data on which a model is trained makes all the difference, and even that is subject to built-in bias and the intricacies of the language in question.
Another differentiating aspect of Meta’s tool is that the company chose to open up its work, including the model, assessment dataset, and training code, with the goal of democratizing the project and to make it a worldwide community effort.
“We worked with linguists, sociologists and ethicists,” said Fan. “And I think that kind of interdisciplinary approach focuses on Human problem. For example, who wants this technology to be built? How do they want it built? How are they going to use it?
While it will benefit the company’s large user base, the translation tool is by no means a charity project; Meta stands to gain from better understanding its users and how they communicate and use language (targeted ads are available in all languages, after all). Not to mention that making company platforms available in new languages will open up previously untapped user bases (if any).
Like many Big Tech companies, Meta Translator should neither be scorned as an instrument of corporate power nor praised as a gift to the masses; it will help bring people together and facilitate communication, while giving the social media giant new insights into our lives and minds.