On November 9, 2021, Airbnb announced that it had rolled out Translation engine, which allows users to automatically read translations of reviews and descriptions in over 60 languages without clicking a translation button. Unlike the current paradigm, the interface provides users with a “view original language” button instead.
Marco Trombetti, CEO of Translated – which has worked with the home rental platform for three years and provided Airbnb with human and machine translation – told Slator: “What’s unique is the fact that for the first time, the two are very symbiotic and integrated. Every correction made by the localization team instantly improves machine translation. »
Airbnb runs on ModernMT, the open-source project led by Translated, co-founded by Fondazione Bruno Kessler, the University of Edinburgh and the European Commission. ModernMT is essentially an adaptive neural machine translation system with a range of applications including IP and life science translations.
“Translated initially provided the basic pre-trained models [for Airbnb’s Translation Engine]said Trombetti, which is continually improving based on corrections from the thousands of linguists who have worked on Airbnb’s content over the past few years. As previously mentioned, Airbnb “human-translated” over 100 million words in 2019, before the pandemic.
According to Airbnb’s press release, “Translation Engine improves the quality of over 99% of Airbnb listings,” based on a study it commissioned from a machine translation rating company within 10 main languages of the platform.
Trombetti said Airbnb commissioned custom reviews of platform content through “independent, untranslated parties.” However, he said the over 99% quality improvement is in line with Translated’s internal ratings. “Translated performs monthly reviews of our ModernMT models using our trained Airbnb linguists,” Trombetti said.
He added that while “many other companies have experimented with pre-translation, with a small subset of their content, usually reviews, to my knowledge this is the first time this has been done for everyone. content and especially on this scale”.
He pointed out that visitors to the site will not only be able to read content in their own language, but also find what was previously inaccessible to them. “It’s not just about removing a pimple; it’s about allowing everyone to explore in a new way,” Trombetti said.
UGC: AI Complex
Asked about the challenge of eliminating data points from user-generated content (UGC) versus training engines on content created by professional writers or linguists, Trombetti said, “UGC is complex for AI because everyone has a different style.
It’s not like training a custom model on very narrow terminology” — Marco Trombetti, CEO, Translated
He explained that because UGC content is often written by non-native speakers and, most likely, by non-professional content writers, “there is a lot of flexibility that the AI needs to learn to translate well. is not like training a custom model on very narrow terminology.
Trombetti added: “The indirect challenge with UGC is scale. Often, UGC scale can be a million times larger than content produced by localization teams; and volume spikes are much more unpredictable.
Additionally, he noted that 10x lower latency is also needed to be able to integrate machine translation into production infrastructure. Therefore, “in human translation, engineering quality is really not an issue.” For UGC’s machine translation, however, “it’s the big deal.”
On top of that there is the commercial element. The CEO of Translated said, “When you run UGC, you are a horizontal service. You have to interact with many divisions and stakeholders. Thus, the level and complexity of the discussions increase. [Airbnb Head of Localization] Salvatore Giammarresi’s leadership, empathy and ability to interact with senior management made it all possible.