ADVERTISEMENT

Indian AI startup Sarvam AI has recently grabbed attention after claiming its models performed better than global AI systems like ChatGPT and Google Gemini in certain tests. The claim comes from results shared in Sarvam AI’s blogs and by co-founder Pratyush Kumar on social media.
“Introducing Sarvam Vision: a state-space based 3 billion parameter vision language model that is competitive with the best results in digitisation in English, and defines a significantly higher bar for Indian languages,” Kumar wrote in his post on X (formerly Twitter).
As per the company’s blog, Sarvam Vision, its latest launch, is capable of a range of visual understanding tasks, including image captioning, scene text recognition, chart interpretation, and complex table parsing.
Sarvam Vision scored about 84% accuracy in a test called olmOCR. What it means is that the model is very good at reading and understanding real-world paperwork, which is often much harder than reading plain text. Meanwhile, Datalab’s Chandra and Mistral OCR trailed behind, scoring 82% and 81.70% respectively. Google’s Gemini Pro 3 model scored 80.20%, and the top laggard was OpenAI’s ChatGPT, which scored 69.80%.
January 2026
Netflix, which has been in India for a decade, has successfully struck a balance between high-class premium content and pricing that attracts a range of customers. Find out how the U.S. streaming giant evolved in India, plus an exclusive interview with CEO Ted Sarandos. Also read about the Best Investments for 2026, and how rising growth and easing inflation will come in handy for finance minister Nirmala Sitharaman as she prepares Budget 2026.
“For the evaluation, we filtered out 1,258 samples out of 1,403 total samples in order to ensure the benchmarking is performed only on English documents,” the blog noted.
OCR stands for Optical Character Recognition. It is the technology that allows computers to read text from scanned documents, images, or PDFs. For example, if you scan a tax form or a handwritten document, OCR helps convert that into text that a computer can understand.
The olmOCR test also goes a step further. It also checks whether the AI can understand complicated documents that contain tables, forms, mixed languages, and messy layouts.
Sarvam also reported strong results in another test called OmniDocBench, scoring a staggering 93%. The only OCR model to beat it is PaddleOCR, which scored 94.37%.
This test checks how well AI understands entire documents instead of just reading text. It looks at whether the AI can correctly interpret tables, structured data, and technical layouts.
This matters because many industries rely on structured documents. Financial reports, legal paperwork, invoices, and government forms all contain tables and organised data. If AI misreads these, it can cause serious errors.
As global benchmarks focus heavily on English document parsing, and at present there is no Indic benchmark of similar standard to the best of our knowledge. “We bridge this gap by releasing the Sarvam Indic OCR Bench which contains 20,267 samples from various document pages. The sample set is distributed across 22 official Indian languages – ranging from 1800-present and with varying quality of scans and content. Furthermore, they are curated at a semantic block-level to robustly evaluate character and word accuracy,” the blog said.
In this benchmark, while Sarvam takes the top spot, the data also shows how global AI models consider Indian languages to be secondary. ChatGPT again scored the lowest—38.60%, and Gemma 3-27B scoring 46.91%. Gemini Pro on the other hand performed much better, scoring 82.51%, securing the second spot.
Apart from document reading, Sarvam introduced a speech model called Bulbul V3. The company said this model performed well in listening tests that measure how natural and accurate AI-generated voices sound.
Speech technology is particularly difficult in India because of the large number of languages, dialects, and accents. Many global speech systems still struggle with pronunciation and mixed-language conversations. Sarvam claims its model handles these challenges better. It scores the least in terms of average error rate (8.60%), while in terms of sounding the most natural, in both high and narrow bands, the model scored the highest, at 63% and 78% respectively.
Sarvam’s advantage mainly comes from focus on India. Global AI systems like ChatGPT and Gemini are designed to do many different things – answering questions, help write content, assist with coding, and hold conversations across many topics.
Sarvam, on the other hand is focusing heavily on problems that are common in India, such as multilingual documents and regional speech patterns. By training its models specifically for these tasks, it can achieve higher accuracy in those areas.
Kumar, in another post on X, said that the company is committed to building India’s full-stack sovereign AI platform - from models grounded in Indian languages and datasets to applications deployed at population scale. "We are grateful for these partnerships to provide us the opportunity to create impact at scale. Together we will build technology that reflects how India thinks, speaks, and solves its hardest problems. The future is intelligent, sovereign, and built locally."
He further continues that the advantages of setting up sovereign AI cannot be over-stated as the country faces a digital asymmetry. "We are exporting data and importing tokens. When our data flows outward, it trains foreign models, and the learning loops compound elsewhere, leaving us with tokens but no leverage. States that move quickly will build enduring advantage. A case in point is the U.S. wherein AI drove nearly 3% GDP growth in 2025. If states in India can generate even 1% annual growth via AI, the compounding effect towards 2047 can be massive," Kumar noted.