Indian AI Startup Unveils OpenHathi-Hi-v0.1: The First Hindi Language Model

Great news from the Indian startup, Sarvam AI! They've just unveiled OpenHathi-Hi-v0.1, the very first Hindi Language Model (LLM) in the OpenHathi series. Designed on a budget-friendly platform and building upon the success of Llama2-7B, this model delivers GPT-3.5-like performance tailored for Indic languages.

Standout Features of OpenHathi-Hi-v0.1

  • Equipped with a 48K-token extension of Llama2-7B’s tokenizer
  • Two-phase training process: Embedding alignment and bilingual language modeling
  • Robust performance across various Hindi tasks, potentially outshining GPT-3.5
  • Maintains proficiency in English

Collaborative Development

  • Developed in collaboration with academic partners at AI4Bharat
  • Fine-tuned in partnership with KissanAI, leveraging conversational data from a bot interacting with farmers in multiple languages

KissanAI's Contribution

  • Recently announced Dhenu 1.0, a groundbreaking Agriculture Large Language Model
  • Tailored specifically for Indian agricultural practices
  • Bilingual model understands English, Hindi, and Hinglish queries, directly addressing the linguistic needs of farmers

Founders and Funding

  • Co-founded by Pratyush Kumar and Vivek Raghavan in July 2023
  • Secured $41 million in Series A funding led by Lightspeed, with participation from Peak XV Partners and Khosla Ventures

Check it Out!

Excitingly, you can check out the base model here. This breakthrough in language technology is not just a win for Sarvam AI but also a significant leap forward for advancements in the Hindi language model space.

