Indic LLM Platform

In plain words

IndicBERT v2 is a free, open language model from AI4Bharat that helps computers understand text written in Indian languages. It is built to power tasks like sorting, tagging, and analysing text, such as detecting sentiment or finding names in a sentence. It is an 'understanding' model, so it does not chat, answer questions conversationally, or write new text on its own.

How to use it

🚧

This model is coming soon!

A step-by-step guide for IndicBERT v2 will appear here once it goes live.

Languages & scripts supported

Works in these languages, in both native script and Roman typing.

हिन्दी Hindiاردو Urduਪੰਜਾਬੀ Punjabiसंस्कृतम् Sanskritनेपाली Nepaliमैथिली Maithiliسنڌي Sindhiکٲشُر Kashmiriবাংলা Bengaliঅসমীয়া Assameseଓଡ଼ିଆ Odiaᱥᱟᱱᱛᱟᱲᱤ Santaliबड़ो Bodoমৈতৈলোন্ Manipuri (Meitei)मराठी Marathiગુજરાતી Gujaratiதமிழ் Tamilతెలుగు Teluguಕನ್ನಡ Kannadaമലയാളം Malayalamकोंकणी Konkani+ Roman & code-mixed

Strengths & limits

An honest look at what it does well and where it struggles.

Good at

✓Sorting and tagging text

✓Detecting sentiment and topics

✓Finding names and entities

✓Working across many Indian languages

Where it struggles

!Not for chat or writing

!Cannot answer questions conversationally

!Needs fine-tuning for each task

!Built for understanding, not generation

Technical details & licence

Training dataTrained on IndicCorp v2, AI4Bharat's large collection of Indian-language text with about 20.9 billion words across 24 languages from four language families.

Size278 million parameters

LicenceMIT licence for the models and code; the IndicCorp v2 dataset is released under CC-0.

Commercial costFree to self-host

Version historyv2 · 2023 — released with the IndicXTREME benchmark; presented at ACL 2023

About the maker

Who builds and maintains IndicBERT v2.

AI4Bharat

4 models

AI4Bharat is an open-source research lab at IIT Madras dedicated to advancing AI for Indian languages. Its freely released datasets, benchmarks, and models — like Airavata and the IndicTrans translation series — are widely used across research and industry.

View their models →See all organisations →