India is a land of diversity. The large number of languages spoken in the country testifies to this. There are four language families, with twenty-two programmed languages, with more than thirty languages spoken by more than one million people.
With this diversity of languages comes a set of challenging tasks. One of them is education, the main concern being to enable the learning of Indian languages. Teaching-learning in one’s mother tongue is known to be really effective. Additionally, higher education is often out of reach for the vast majority of people due to the English language barrier. Recognizing this need and gap, the Government of India under the aegis of the Prime Minister’s Science, Technology and Innovation Advisory Council (PM-STIAC), has the mission of National Language Translation (NLTM) as the l one of its main missions.
NLTM aims to make scientific and technological opportunities and developments accessible to all, by removing the barrier posed by the requirement of a high-level command of English. Using a combination of machine and human translation, the mission will eventually enable access to educational materials in a bilingual manner – in English and in their native Indian language. The Ministry of Electronics and Information Technology (MEITy) is the Government’s implementation wing of this mission.
One of the machine speech to speech opportunities is to get more than 40,000 educational videos on NPTEL and SWAYAM in English, translated into many Indian languages. This is also in line with the new National Education Policy (NEP) which emphasizes training in Indian languages. Currently, efforts are underway to manually transcreate these videos into Indian languages. This involves a huge amount of time and resources.
Responding to this challenge, a consortium of institutes consisting of IITB, IITM and IIITH led by Professors Pushpak Bhattacharya from Indian Institute of Technology Bombay, S Umesh and Hema Murthy from Indian Institute of Technology Madras and Dipti Mishra Sharma from the International Institute of Information Technology Hyderabad has come together to create the Speech-to-Speech Machine Translation (SSMT) system from English to many Indian languages.
SSMT consists of a pipeline of steps: (i) first the spoken utterance is converted into text (ASR), (ii) then the produced text is translated into the target language text (MT), and (iii) finally, the translated text is rendered into speech (TTS).
SSMT poses several challenges, however: (a) each of the ASR-MT-TTS can introduce errors, albeit small; (b) the text of the ASR may be disfluent, i.e. have non-linguistic elements like “uhh”, “umm” etc. ; (c) the tone and accent of English varies from region to region in India; (d) word order changes from English to Indian languages; (e) speakers mix languages like Hinglish (Hindi + English), Banglish (Bengali + English), Tanglish (Tamil + English), etc. ; (f) finally, the appearance of text and speech must be synchronized – the so-called lip synchronization problem.
The good part is that a machine would do most of the translation efficiently. A little effort is required to manually review and modify the output at various stages of the pipeline. This has been tested through the implementation of the SSMT pipeline by said consortium, and it is envisioned that this hybrid approach can reduce the entirely manual translation effort by almost 75%.
The achievement of SSMT is poised to make available a wealth of digital learning content in many Indian languages, thus enhancing the accessibility of such content. Further, moving forward, if proper machine learning and AI models are built on them, such a system can also interactively respond to learners’ queries in their own language. Certainly, the future looks bright with the development of these applications and aim to minimize the learning gap, especially in Indian languages.
Editor’s Note: This is part of the special Lab Stories feature we are bringing to you.