Earlier this year in July, Google had announced its $10 billion digitization fund for India in its Google for India 2020 event for the next five to seven years, and one of the main focus areas of this fund was investing in the Indian language internet ecosystem to improve penetration of digital services and products. Following this, Google India has unveiled a new artificial intelligence (AI) powered model called Multilingual Representations for Indian Languages, short for, MuRIL to improve interoperability of web services in 16 Indian languages.
MuRIL will work for more than a dozen Indian languages like Assamese, Bengali, English, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Tamil, Telugu, and Urdu.
This model has been trained using its own language learning model BERT (Bidirectional Encoder Representations from Transformers), which is currently used to parse almost all English queries on its search engine. Google has made MuRIL free and open-source, available for download and uses from its machine learning platform TensorFlow.
Besides improving the translation of internet content in many languages, MuRIL would allow search users to easily switch search results from English to Tamil, Telugu, Bangla, Marathi apart from Hindi. Google would also show content in five Indian languages – Hindi, Bangla, Marathi, Tamil, and Telugu – even when the query is in English. Simultaneously, Google Maps would also support one of nine Indian languages regardless of the system language.