Skip to content

Instantly share code, notes, and snippets.

@Hemanthkumar2112
Last active October 19, 2024 14:07
Show Gist options
  • Save Hemanthkumar2112/d7808d722611d71f954eeb08aace1cec to your computer and use it in GitHub Desktop.
Save Hemanthkumar2112/d7808d722611d71f954eeb08aace1cec to your computer and use it in GitHub Desktop.
List of parallel dataset for English to 9 Indian language

Indian language MT dataset HuggingFace Repo link

    Hindi: https://www.kaggle.com/datasets/aiswaryaramachandran/hindienglish-corpora
    Tamil: Hemanth-thunder/en_ta
    Malayalam: Hemanth-thunder/english-to-malayalam-mt 
    Kannada: Hemanth-thunder/english-to-kannada-mt
    Telugu: Hemanth-thunder/english-to-telugu-mt-155k
    Bengali: Hemanth-thunder/english-to-bengali-mt
    Marathi: Hemanth-thunder/english-to-marathi-mt
    Gujarati: Hemanth-thunder/english-to-gujarati-mt
    Odia: Hemanth-thunder/english-to-odia-mt
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment