Union Science & Technology Minister Jitendra Singh on Tuesday described BharatGen as India’s first sovereign, multilingual, and multimodal AI Large Language Model (LLM), during a visit to IIT Bombay where he reviewed the project’s progress and interacted with the core development team.
What is BharatGen?
BharatGen is India’s first sovereign multilingual and multimodal LLM— an AI system designed to reflect the country’s linguistic and cultural diversity. It supports more than 22 Indian languages and integrates three core modalities – Text: Understanding and generating language; Speech: Recognising and producing natural speech; Document Vision: Interpreting complex Indian document formats
The “sovereign” design ensures full national ownership of the data, AI models, and the entire AI stack. As a digital public asset, BharatGen is intended to serve 1.4 billion citizens in the languages they use every day.
What are the key components of BharatGen?
At the heart of BharatGen is Bharat Data Sagar, one of India’s most ambitious data initiatives. It aims to build a fully indigenous data repository through large-scale, India-centric data collection and curation involving individuals, institutions, and organisations across sectors.
The objective is to capture India’s lived realities, regional diversity, and cultural nuance, ensuring both accurate AI performance and long-term data sovereignty.
What models has BharatGen developed so far?
Four core models have been released under the BharatGen ecosystem:
Param-1 (Text Model)
A 2.9-billion-parameter foundational language model
Trained on 7.5 trillion tokens, one of the largest India-focused datasets ever used
Over one-third of the data comes from Indian languages and contexts
Shrutam (Automatic Speech Recognition Model)
A 30-million-parameter ASR model
Designed for India’s linguistic variety, speech patterns, and accents
Built to work accurately across states and dialects
Sooktam (Text-to-Speech Model)
A 150-million-parameter TTS system
Generates natural-sounding speech in nine Indian languages
Suitable for accessibility tools, public services, and digital applications
Patram (Document-Vision Model)
India’s first document-vision model
7 billion parameters, trained on 2.5 billion tokens
Tailored for Indian documents: IDs, certificates, government forms, handwritten entries, and local administrative layouts
Together, these models form a complete sovereign multimodal AI stack (Text + Speech + Vision), a capability possessed by only a handful of countries. This enables large-scale applications across governance, education, healthcare, agriculture, startups, and industry.
What proof-of-concept applications has BharatGen built?
During the Minister’s visit, the BharatGen team demonstrated several early applications designed for real-world impact:
Krishi Sathi
A voice-enabled WhatsApp advisory tool that lets farmers ask questions in their local language and receive instant guidance tailored to regional agricultural needs.
e-VikrAI
An AI tool that generates product descriptions from a single image, helping small sellers and micro-entrepreneurs quickly create digital storefronts—particularly valuable in rural and semi-urban areas.
Docbodh
A document Q&A platform built on Patram. It reads, understands, and interprets complex documents, helping citizens understand government files, legal notices, forms, and certificates.
The tool can improve transparency and last-mile accessibility in public services.
These prototypes illustrate how BharatGen can strengthen governance, improve citizen services, and support digital inclusion.
How is BharatGen funded, and who are the key collaborators?
BharatGen is supported under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS) of the Department of Science and Technology.
Rs 235 crore has been allocated through the Technology Innovation Hub at IIT Bombay.
The consortium includes IIT Bombay (lead), IIT Madras, IIT Kanpur, IIIT Hyderabad, IIT Mandi, IIT Hyderabad, IIM Indore, IIT Kharagpur, and IIIT Delhi.
Recently, Rs 1,058 crore was approved by MeitY under the IndiaAI Mission, expanding BharatGen into a nationwide effort to build a sovereign AI stack.
What other major AI initiatives has India launched?
In 2023, the government approved the IndiaAI Mission, backed by ₹10,300 crore over five years to strengthen India’s AI capabilities.
The mission has several components such as IndiaAI Compute Capacity, IndiaAI Innovation Centre (IAIC), IndiaAI Datasets Platform, IndiaAI Application Development Initiative, IndiaAI FutureSkills, IndiaAI Startup Financing, Safe & Trusted AI Framework
The IndiaAI Innovation Centre (IAIC) will drive the development of foundational AI models, including indigenous Large Multimodal Models (LMMs), using edge computing and distributed architectures.


