India's Open TTS Data Infrastructure — methodology, dataset, codec, sentence bank, benchmark, and sub-1B SLM designed to scale to 100+ Indian language varieties. Released as open infrastructure under Permissive license.
India has open ASR infrastructure for 100+ languages. It has nothing equivalent for TTS. svara builds that — with better data, not more data.
svara is a proof of methodology designed for 100+ Indian language varieties. Model, dataset, codec, sentence bank, eval benchmark, and recording playbook — all released as open infrastructure.
Strong foundations for Indic speech — enabled svara v1 training. But gaps remain for production scale: limited code-mixing, word skipping in non-Hindi languages, long-form instability, prosody degradation over context, text normalisation gaps, and low-resource language inconsistency. No expressivity or voice design.
English-centric or thin per-language. No dialect granularity, no expressivity, no Indic phonetic fidelity in codec, no on-device.
LLM-based discrete audio tokens. Expressivity & emotion control, voice cloning, code-switching, Indic-finetuned codec, robust text normalisation (WFST), 100+ varieties, sub-1B on-device.
A founding team that has shipped AI to millions.
Visionary engineering leader with 20+ years across enterprise and public-sector tech. Previously led engineering at NIIT and Manipal. Drives Kenpath's partnerships with EkStep (Nandan Nilekani), Gates Foundation, Intel, and IIIT Bangalore.
Decades building robust, scalable systems for mission-critical deployments. Architected svara's production inference stack (vLLM, FastAPI, Docker, GGUF, CUDA). Leads platform engineering, on-device deployment via OpenVINO and NPU.
NLP, speech AI, agentic systems. Led svara v1 and Voiceclone-beta. Research collaborator on Urbanization from Within (Oxford University Press, 2026) and Efficient Neural Geo-referencing of Map Collections (2023). Leads svara SLM architecture, training, and evaluation.
Strategy and client engagement leader driving Kenpath's partnerships with governments, enterprises, and mission-driven organisations. Translates complex AI capabilities into measurable outcomes at scale.
Experienced project manager steering Kenpath's engagements from first pilot to production deployment. Brings rigour to planning, delivery, and stakeholder coordination — ensuring complex AI programs ship on schedule.
Kenpath builds and operates AI platforms at population scale across agriculture, governance, dairy, accessibility, and edge retail.
AI for Farmer Producer Orgs — processing, packaging, branding, market access.
Wadhwani FoundationMultilingual agentic interface for policies, subsidies & approvals.
64 industries · 17+ ministriesEdge checkout with real-time produce recognition. ML-Ops on-device.
<100ms · 65%+ fasterAI suite with voice + WhatsApp. SaaS across India, Brazil, Africa.
50+ orgs · 87%+ accuracyAI-powered learning & navigation tools for the visually impaired.
Enable India · 100K+ livesPersonalised career journeys using conversational intelligence.
UK · Neuroscience-driven
Bharat-VISTAAR & Amul AI recognised by PM Modi. Launched by Union Agriculture Minister. Kenpath as tech partner.
Panel with C.V. Sridhar (Amaravati Quantum Mission, AP), Anusha Dandapani & Sameer Chauhan (UNICC), Frugal AI Hub.
With EkStep Foundation (Nandan Nilekani) — designing voice AI for India's national citizen helplines at scale.
Aditya Chhabra on Ministry of Education evaluation committee for TANUH AI CoE for Healthcare at IISc.
Presenting India's sovereign voice AI and population-scale DPI to the international AI community.
Showcasing population-scale AI platforms and open-source svara TTS.
With Chief Secretary Maharashtra Vikas Rastogi. Hosted by Govts of India & Maharashtra and AIAIC.
National Agri DPI launched by Union Agriculture Minister in Jaipur. Kenpath as core technology partner.
Expressive emotion tags bring natural warmth, humour, and feeling to synthesized speech.
kenpath.io/en/open-source/svara
Deterministic WFST-based text normalisation for 19 Indian languages. Critical preprocessing for any TTS pipeline — converts numbers, dates, currency, measurements into natural spoken form at 1–5ms latency. Open-sourced under Apache 2.0.
1–5ms WFST traversal vs 500ms local LLM vs 2000ms API LLM. Same input always produces same output — no hallucinations, no temperature tuning. Runs on CPU anywhere. Born from real-time TTS needs with svara.
Built on NVIDIA NeMo Text Processing framework (Pynini). Extended by Kenpath for all 19 Indic languages with language-specific rules for each semiotic class.
Existing Indic TTS datasets cover literary registers of major languages — but miss dialects, prosodic diversity, emotion, and code-switching. TTS doesn't need 100,000 hours. It needs ~2 hours of phonetically designed data per variety — that's what makes 100+ varieties viable.
Trained svara v1 on SYSPIN, IndicTTS, RASA, SPICOR — ~2,000 hours across 22 languages. These are excellent but have a ceiling: standard literary registers, limited dialect granularity, limited prosodic and emotional diversity.
~1000 phonetically designed sentences per variety via set-cover algorithms. ~200 parallel core sentences across all varieties as cross-lingual anchors for voice cloning and dialect transfer. Composed natively, never translated.
Studio (48kHz/24-bit) trains the TTS model. Mobile app (24kHz, on-device neural denoising) trains the codec. You don't need a studio in every village — mobile tier makes tribal and remote dialects viable.
Base model trained on 22 languages. Each new dialect = ~2 hours of designed data + 1 LoRA adapter. 100 varieties ≈ 200 hours total. Fair-paid native speakers, 2–4 per variety, gender and age balanced.
A sub-1B SLM over discrete audio tokens — small enough for on-device, powerful enough for 100+ varieties. Evaluation goes beyond MOS and WER to measure what actually matters for Indian languages.
Language model generates audio token sequences, decoded to waveform. Single-GPU trainable. Deployable on Android, Intel Core Ultra / Xeon / Panther Lake CPUs, Qualcomm NPUs via GGUF. Near-zero per-call cost for Bhashini population-scale distribution.
Finetune audio codec on Indic speech so tokens natively represent retroflex consonants, aspirated stops, nasal vowels, breathy phonation — sounds that English-trained codecs handle poorly. Released as standalone open infrastructure.
Stage 1: Train base model from scratch on 22 scheduled languages. Stage 2: Per-dialect LoRA adaptation from ~2 hours. Cross-lingual transfer via parallel core sentence bank — same speaker, same meaning, different language.
Phonological fidelity (retroflex, aspiration, nasals). Morphological integrity. Semantic adequacy. Sociolinguistic appropriateness — disaggregated by age, register, code-switching. Community-led usability with paid native evaluators.
Seven open-source artefacts, released under IndiaAI-approved licensing terms and uploaded to AIKosh — so any community, in India or globally, can extend coverage to their own language using the same methodology and tools.
Sub-1B multilingual model. ~50 varieties in 12 months, scales to 100+. On-device via NPU/GGUF.
Finetuned on Indic audio — tokens natively represent retroflex, aspirated stops, nasal vowels.
Studio (48 kHz) + mobile audio. ~50 varieties, ~120 fair-paid native speakers, full annotation.
Per-variety sets (~500–1000 sentences) via set-cover algorithms over phonetic & prosodic targets.
Phonological fidelity, morphology, semantics, sociolinguistic appropriateness, community-led usability.
Sentence-design toolkit, mobile recording app, neural-denoising pipeline, eval harness, playbook.
Production-ready: Docker, vLLM, FastAPI, GGUF, CUDA, REST API, streaming & voice cloning.
All artefacts uploaded to AIKosh, HuggingFace & GitHub under IndiaAI-approved licensing.
Built for Bharat. Runs anywhere — on a feature phone, a cloud server, or an edge device.
Voice-first farmer advisory in dialects farmers actually speak — Awadhi for UP, Marwari for Rajasthan, Telangana Telugu for southern Andhra.
Multilingual IVR and voice bots for central, state, and municipal helplines. Grievance redressal, scheme discovery, tax helplines.
Symptom checkers, ICDS and Poshan advisory, tele-consultation support, vaccination reminders for low-literacy populations.
NCERT and state-board AI tutors that read textbooks, explain concepts, and converse in mother tongue. Spoken English practice for rural students.
Screen readers, navigation assistants, and audio description of visual content for the visually impaired in native Indian languages.
Voice-based banking, insurance, and micro-credit interfaces. Fraud-awareness voice bots for rural customers in their native language.
Contact centres, voice-described product catalogues, content narration for media and e-commerce — real-time voice-to-voice translation via LiveKit.
Runs on Intel Xeon / Panther Lake CPUs, Core Ultra NPUs, Qualcomm NPUs, and commodity Android. Fully offline voice AI for privacy-sensitive workloads.
Each cycle ends with a public milestone release on HuggingFace, AIKosh & GitHub.
Personnel 1.2Cr · Mobile app + tooling 0.4Cr · Ops + legal 0.2Cr · Workshops + papers 0.2Cr
Speaker fees 1.2Cr · Linguistic design 1.0Cr · Studio + mobile rig 0.4Cr · Field travel 0.2Cr · Community eval 0.2Cr
Via empanelled cloud providers. H100 SXM at ~INR 92/GPU/hour subsidised rate.
Managed inference for banks, PSUs, telecom, healthcare. Domain-specific voice agents. Custom voice creation. Bhashini distribution at near-zero per-call cost. 500K+ organic downloads validate enterprise demand.
H100 SXM required for from-scratch training of sub-1B core across 50+ varieties with long audio sequences and large batch sizes. Phased ramp: 8 GPUs while collecting data → 16 for base training → 24 at peak multi-variety intensity.
Ministry of Agriculture (Govt of India), Government of Maharashtra, Government of Uttar Pradesh, Intel, EkStep Foundation (Nandan Nilekani), People+ AI, Wadhwani Foundation, Gates Foundation (via programme sponsors), IIIT Bangalore (CoSS), IIT Jammu, Selco, Enable India, Apurva AI, Open Inclusion, MetLife, and others.
















| Name | Phone | |
|---|---|---|
| Muneender Jillella Founder & CEO | muneender@kenpath.io | +91 98867 35532 |
| Aditya Chhabra Chief Data Scientist | aditya@kenpath.io | +91 81780 52333 |
| Thyagarajan Babu CTO | rajan@kenpath.io | +91 98454 56279 |
kenpath.io
huggingface.co/kenpath
github.com/Kenpath
Permissive license