Google DeepMind’s India unit has embarked on a groundbreaking project named Morni (Multimodal Representation for India), which aims to develop an artificial intelligence model that understands and represents 125 Indian languages and dialects. This initiative highlights a significant effort to integrate India’s rich linguistic diversity into the digital realm, ensuring inclusivity and accessibility.
The Linguistic Landscape of India
India is home to a staggering number of languages, with 22 officially recognized as scheduled languages. However, the linguistic diversity extends far beyond this, with over 100 languages actively spoken across the country. Notably, there are around 60 languages spoken by over a billion people collectively, and more than 125 languages each with over 100,000 speakers.
Despite this rich diversity, many Indian languages lack a significant digital footprint. For instance, although Hindi is spoken by nearly 10% of the global population, it constitutes only 0.1% of online text. Moreover, 73 out of the 125 languages targeted by the project have no available digital data, presenting a substantial challenge for digital inclusion.
Project Vaani: Bridging the Digital Divide
To address the data scarcity issue, Google DeepMind has launched Project Vaani, in collaboration with the Indian Institute of Science (IISc) and ARTPARK (Artificial Intelligence & Robotics Technology Park). This project aims to collect and digitize speech data from across India, creating an open-source resource that enhances the digital representation of these languages.
Key Milestones of Project Vaani:
-
First Phase: The initial phase of Project Vaani successfully compiled over 14,000 hours of speech data in 58 languages. This data was collected from 80,000 speakers across 80 districts.
-
Current Progress: Announced in December 2022, the project aims to gather and transcribe 154,000 hours of speech data from all 773 districts in India. The second phase is currently underway, focusing on 160 districts across all states.
This expansive data collection effort is essential for developing AI models that can accurately understand and process a wide range of Indian languages.
Expansion of Language Coverage in Google Translate
In addition to Project Vaani, Google has made significant strides in language technology through its recent expansion of Google Translate. The company has introduced 110 new languages, including five Indian languages, using its PaLM-2 transformer model. This expansion enables Google Translate to cater to over 600 million people worldwide whose languages are now supported.
Broader Technological Initiatives
Google DeepMind’s initiatives extend beyond language representation. The company is also developing a digital agri-stack aimed at enhancing agricultural practices in India. This stack could facilitate access to loans for farmers, provide affordable crop insurance, and improve government subsidy programs through data-driven approaches.
Google DeepMind’s projects, Morni and Vaani, represent a significant step towards digital inclusivity and linguistic preservation. By focusing on a vast array of Indian languages and dialects, these projects aim to ensure that every language has a digital presence, thereby contributing to a more inclusive digital landscape. This work not only preserves India’s linguistic heritage but also makes technology more accessible to millions of people, reflecting a commitment to a diverse and interconnected digital world.
With inputs from agencies
Image Source: Multiple agencies
© Copyright 2024. All Rights Reserved Powered by Vygr Media.