Sarvam AI Beats ChatGPT & Gemini: How India’s Vision OCR Outperformed Global Giants

A Bengaluru startup just did something remarkable: it built AI models that beat ChatGPT, Google Gemini, and Anthropic Claude on their own benchmarks.

Sarvam AI, founded less than three years ago, launched three products in February 2026 that are forcing Silicon Valley to pay attention. The company’s Vision OCR system scored 84.3% on olmOCR-Bench, beating Gemini 3 Pro (80.2%) and ChatGPT (69.8%). Its Bulbul V3 text-to-speech model now powers voice applications across 11 Indian languages with 35+ different voices.

The timing isn’t accidental. Sarvam released these products days before India’s AI Impact Summit 2026, signaling that India’s AI ambitions go beyond call centers and IT services. The country is building foundational AI models from scratch, and Sarvam is leading the charge.

Sarvam AI Beats ChatGPT & Gemini: How India's Vision OCR Outperformed Global Giants

What Sarvam Actually Built

Sarvam Vision: OCR That Handles Indian Chaos

Traditional OCR systems break when you throw them an Indian government form. Multiple scripts on the same page. Hand-stamped sections. Low-resolution scans from aging equipment. Code-switching between English and regional languages.

Sarvam Vision was trained specifically for this mess. The 3-billion parameter vision-language model handles documents across 22 Indian languages, including Hindi, Bengali, Tamil, Telugu, Marathi, Malayalam, Kannada, Gujarati, Punjabi, Urdu, and Assamese.

The benchmark results tell the story:

olmOCR-Bench (English-only subset):

  • Sarvam Vision: 84.3%
  • Gemini 3 Pro: 80.2%
  • DeepSeek OCR v2: Not disclosed but lower
  • ChatGPT: 69.8%
Sarvam Vision: OCR

OmniDocBench v1.5:

  • Sarvam Vision: 93.28%
  • Particularly strong on complex layouts, nested tables, scientific formulas, and chart interpretation

Indian Language Documents:

  • Average word-level accuracy: 87.36% across 22 languages
  • Significantly outperforms global models that typically struggle below 70% accuracy on Indic scripts

The model doesn’t just extract text. It understands document structure, reading order, relationships between elements, and semantic meaning. This matters for applications like KYC automation, historical document digitization, and government records processing where errors compound quickly.

Sarvam is confident enough in Vision that they’re offering free API access through February 2026. That’s not typical for a startup trying to monetize quickly. It’s a bet that once developers experience the accuracy difference on real Indian documents, they won’t go back to global alternatives.

Bulbul V3: Voice That Sounds Human

If you’ve used voice AI in Indian languages, you know the problem. Robotic pronunciation. Awkward pauses. Complete failure on code-mixed sentences where people naturally switch between English and their regional language mid-conversation.

Bulbul V3 targets production use cases, not demos. The text-to-speech model currently supports 35+ voices across 11 Indian languages, with expansion planned to cover all 22 scheduled languages.

In independent third-party listening tests, Bulbul V3 achieved the highest listener preference scores and lowest error rates across multiple use cases. The model particularly excels at:

  • Telephony-grade quality: Maintains clarity over noisy phone lines where global models degrade
  • Code-mixed speech: Handles Hinglish and other language combinations naturally
  • Numeric pronunciation: Correctly distinguishes “nau” (nine) from “no” based on context
  • Named entity handling: Pronounces Indian names, places, and terms accurately

The pricing is strategic. Sarvam positions Bulbul V3 as a cost-effective alternative to ElevenLabs for Indian language applications. At ₹1 per minute for enterprise voice interactions, it undercuts global providers while delivering better accuracy for local contexts.

Sarvam Audio: Speech Recognition Across 22 Languages

Completing the speech stack is Sarvam Audio, launched the same week as Vision and Bulbul V3. The automatic speech recognition (ASR) system converts spoken Indian language audio into text, handling accents, background noise, and multi-speaker environments.

The full stack approach matters. Enterprises can now build complete voice workflows without stitching together tools from multiple providers, each with different strengths and weaknesses on Indian languages.

The IndiaAI Mission: Sovereign AI With Government Backing

On April 26, 2025, India’s IT Minister Ashwini Vaishnaw announced that Sarvam AI won the contract to build India’s first sovereign large language model under the IndiaAI Mission.

The government committed:

  • 4,096 NVIDIA H100 SXM GPUs for six months
  • ₹247 crore (~$30 million) worth of GPU compute access
  • 60% of GPU access counted as equity investment (₹148 crore)
  • Infrastructure provided by Yotta Data Services

Sarvam’s mandate: develop an open-source 120-billion parameter AI model optimized for Indian languages, built entirely on domestic infrastructure by Indian talent. Use cases include government services like “2047: Citizen Connect” and “AI4Pragati.”

This follows Sarvam’s earlier models:

  • Sarvam-1: 2-billion parameter model
  • Sarvam-M: 24-billion parameter model with hybrid reasoning capabilities

The IndiaAI Mission represents a broader push. The government approved ₹10,300 crore (~$1.2 billion) over five years to build computing infrastructure, develop indigenous AI capabilities, and train the workforce. India now operates 38,000 GPUs available to startups at subsidized rates of ₹65 per hour, compared to ₹330+ per hour on AWS and ₹590+ per hour on Azure.

Sarvam isn’t the only company selected. The government also picked:

  • Soket AI: Building a 120-billion parameter model for defense, healthcare, and education
  • Gnani AI: Developing a 14-billion parameter Voice AI model
  • Gan AI: Creating a 70-billion parameter model for superhuman text-to-speech

But Sarvam was first, and its recent product launches suggest it’s furthest ahead.

The Founders: From Aadhaar to AI Sovereignty

Dr. Pratyush Kumar: CEO and Co-Founder

Pratyush Kumar holds a BTech in Electrical and Electronics Engineering from IIT Bombay and a PhD in Computer Engineering from ETH Zurich. His career spans research roles at Microsoft Research and IBM Research, along with academic positions at IIT Madras where he served as adjunct faculty.

Before Sarvam, Kumar co-founded:

  • AI4Bharat: Open-source initiative at IIT Madras focused on Indian language AI, which received ₹36 crore from the Nilekani family in 2022
  • One Fourth Labs: AI education and consulting venture
  • PadhAI: Affordable online learning platform that trained 50,000 students in deep learning for ₹1,000 per course between 2016-2023

At Sarvam, Kumar leads product development, research, and overall strategy, focusing on building multilingual AI systems designed specifically for Indian use cases with emphasis on reliability, governance, and enterprise adoption.

Dr. Vivek Raghavan: Co-Founder

Vivek Raghavan graduated from IIT Delhi and earned his PhD in Electrical and Computer Engineering from Carnegie Mellon University. He started his career as VP at Magma Design Automation, with experience at Synopsys and Avant! Corporation.

But Raghavan’s most significant work came in India’s digital public infrastructure. He volunteered with the Unique Identification Authority of India (UIDAI) on Aadhaar’s biometric systems for nearly 12 years, contributing to the world’s largest digital identity program. He later worked with EkStep Foundation and Khosla Labs on large-scale digital platforms.

Raghavan also serves on the AI Committee of the Supreme Court of India, bringing governance and policy expertise that complements Kumar’s technical leadership.

The duo met through AI4Bharat at IIT Madras, where both contributed to open-source language AI research. They founded Sarvam AI in August 2023 with the mission to make generative AI accessible to everyone in India at scale.

Funding: $53.8M With Strategic Investors

Sarvam AI raised $53.8 million across three funding rounds:

Series A (December 2023): $41 million

  • Led by: Lightspeed Venture Partners
  • Co-led by: Peak XV Partners (formerly Sequoia Capital India)
  • Participated: Khosla Ventures
  • Notable: One of the largest early-stage funding rounds for an Indian AI startup at the time

Series A+ (August 2025): Amount undisclosed

  • Led by: Avvanti Advisors
  • Valuation: ₹1,720 crore (~$205 million)

Total investors: 10 institutional investors and 4 angel investors.

The company has grown from founding to 114 employees as of August 2025, representing 226% year-over-year growth. Annual revenue for fiscal year ending March 31, 2025 was ₹29.1 crore (~$3.5 million).

The government’s ₹247 crore GPU access grant, with ₹148 crore counting as equity, significantly boosts the company’s effective capitalization without traditional dilution.

Strategic Partnerships: UIDAI, AI Alliance, Enterprise Customers

Aadhaar Integration

The Unique Identification Authority of India (UIDAI) partnered with Sarvam AI to enhance user experience for Aadhaar number holders:

  • AI-powered voice-based interactions in multiple languages
  • Multilingual AI support for wider accessibility
  • Near real-time feedback during enrollment or updates
  • Alerts about possible overcharging
  • Instant fraud notifications for suspicious authentication activity

This partnership gives Sarvam access to one of the world’s largest identity databases, with over 1.3 billion enrolled users, creating a testing ground for population-scale AI deployment.

AI Alliance Membership

In 2024, Sarvam joined the AI Alliance, a global consortium co-led by Meta and IBM championing open-source artificial intelligence. Other Indian members include Infosys, AI4Bharat (IIT Madras), IIT Jodhpur, KissanAI, People+AI, and Wadhwani AI.

The alliance focuses on:

  • Open-source AI model development
  • Responsible AI practices
  • Democratizing AI access
  • Fostering AI innovation ecosystems

Enterprise Adoption

While Sarvam doesn’t publicly disclose customer lists, use cases span:

  • Financial services: KYC automation, document processing
  • Government services: Citizen helplines, form processing
  • Agriculture: Voice interfaces for farmers (KissanAI use case)
  • Customer support: Multilingual call center applications
  • Education: Accessible learning tools in regional languages

The company targets ₹1 per minute pricing for enterprise voice interactions, significantly undercutting global alternatives while delivering superior accuracy for Indian contexts.

Why Global Models Fail in India

The performance gap between Sarvam’s models and global alternatives isn’t about raw capability. ChatGPT, Gemini, and Claude are technically sophisticated systems trained on trillions of tokens.

The problem is training data composition. Global models are predominantly trained on English-language internet content, with some multilingual data added later. For Indian languages and contexts, this creates several issues:

Script complexity: Devanagari, Bengali, Tamil, and other scripts have different rendering, ligatures, and character combinations than Latin alphabets. OCR systems trained primarily on English often misread characters or break on complex combinations.

Code-mixing: Indians naturally switch between English and regional languages mid-sentence. Global models treat this as noise. Sarvam’s models are trained to handle it as normal linguistic behavior.

Domain-specific documents: Indian government forms, academic certificates, and legal documents have unique layouts and terminology. Training data from Western contexts doesn’t translate.

Voice characteristics: Accent, intonation, pace, and pronunciation patterns differ significantly. A model trained on American English voice data will struggle with Indian English, let alone Hindi or Tamil.

Cost structure: Running inference on frontier models like GPT-4 or Gemini Ultra is expensive. For high-volume applications like voice assistants in call centers, costs become prohibitive. Smaller, specialized models deliver better unit economics.

Sarvam’s thesis is that specialized models trained specifically for Indian languages and use cases will outperform general-purpose models for local applications, while maintaining competitive pricing through focused optimization.

The benchmark results validate this thesis. Vision’s 84.3% on olmOCR-Bench versus ChatGPT’s 69.8% represents a 20% relative improvement. In production environments processing millions of documents, that accuracy difference compounds dramatically.

The Broader Context: India’s AI Ecosystem

Sarvam’s success comes amid India’s broader AI infrastructure boom:

Data center investments (2025-2026):

  • Microsoft: $17.5 billion over 4 years (largest Asia investment)
  • Google: $15 billion over 5 years
  • Amazon Web Services: $12.7 billion by 2030
  • Reliance Industries: Planning 3-gigawatt Jamnagar data center ($20-30 billion), potentially world’s largest

Compute availability:

  • IndiaAI Mission: 38,000+ GPUs operational (original target was 10,000)
  • Subsidized rate: ₹65 per GPU hour versus ₹330+ on AWS, ₹590+ on Azure
  • 5x-9x cheaper than commercial cloud alternatives

Talent development:

  • FutureSkills program: 13,500 scholars receiving AI training, including 500 PhD fellows
  • TCS upskilling 600,000-person workforce through NVIDIA partnership
  • 250,000+ Indian developers certified on CUDA programming

Market size:

  • India ranks 3rd globally in AI competitiveness (Stanford’s 2025 AI Vibrancy Tool), behind only US and China
  • 89% of new Indian startups integrated AI into products in 2025 (government statistics)
  • Government estimates AI could add $500 billion to Indian economy by 2030

Competitive landscape:

  • 506 proposals received for IndiaAI Mission foundation model development
  • 12 startups selected across two phases
  • Growing ecosystem of AI infrastructure providers (Yotta, E2E Networks, Tata Communications)

Sarvam is positioning itself as the default choice for enterprises building AI applications for Indian markets, similar to how companies like Scale AI became infrastructure providers for the broader AI industry.

The 14-Day Launch Streak

Sarvam’s February 2026 product launches followed a deliberate 14-day strategy:

  • Day 1-3: Sarvam Vision OCR and Document Intelligence APIs (February 5-7)
  • Day 4-7: Bulbul V3 text-to-speech with benchmark results (February 7-10)
  • Day 8-10: Sarvam Audio speech recognition across 22 languages
  • Day 11-14: Arya enterprise agent platform announcement (February 11)

The rapid-fire releases generated compounding media attention. Each announcement built on previous momentum, with tech commentators, developers, and enterprise customers all responding across different channels.

Timing mattered. The India AI Impact Summit 2026 runs February 19-20 at Bharat Mandapam in New Delhi, with government positioning it as a follow-up to major AI summits in Bletchley Park (2023), Seoul (2024), and Paris (2025). Sarvam’s launches positioned the company as a showcase for India’s AI capabilities right before the global stage.

The Bottom Line

Sarvam AI’s February 2026 product launches represent a legitimacy threshold. The company moved from “interesting regional player” to “competitive AI provider” through measurable benchmark wins against global giants.

Vision’s 84.3% accuracy on olmOCR-Bench versus Gemini’s 80.2% and ChatGPT’s 69.8% isn’t a fluke. It’s the result of focused training on Indian documents, languages, and use cases that global providers underserve. Bulbul V3’s voice quality and Indic language coverage addresses a real gap where ElevenLabs and similar providers charge premium prices for inferior results.

The government backing through IndiaAI Mission provides credibility and resources that most startups can’t access. With 4,096 H100 GPUs and ₹247 crore in compute support, Sarvam can train models at scale that would otherwise require tens of millions in additional funding.

The founder pedigree matters. Kumar’s research background and Raghavan’s digital public infrastructure experience create a combination of technical depth and execution capability. Both have track records building at scale through AI4Bharat and Aadhaar.

But the real test is commercial validation. Can Sarvam convert benchmark wins and government contracts into sustainable enterprise revenue? Will CIOs at Indian banks, telecom companies, and government agencies pay for specialized models when they can use ChatGPT or Gemini for broader applications?

The next 12-18 months will answer these questions. For now, Sarvam has momentum, credibility, and products that work. In a market where most AI startups struggle to differentiate, that’s a strong position.

Whether India becomes a major AI power or remains dependent on Western technology will partly depend on companies like Sarvam proving that specialized, sovereign AI can compete commercially, not just on benchmarks. The early results suggest it’s possible. The execution challenge is making it profitable.

Scroll to Top