# awesome-bangla
A collection of tools, datasets and resources on Bangla computing. This list was compiled to help researchers and hobbyists interested in Natural Language Processing with the Bangla (Bengali) language. Please feel free to contribute.

## Typing Tools and Keyboards

### End-User Products
  - Avro Keyboard ([Windows](https://www.omicronlab.com/avro-keyboard.html), [Mac](https://www.omicronlab.com/iavro.html), [Linux](http://linux.omicronlab.com/), [Ubuntu](https://github.com/maateen/avro), [Online](https://avro.im/))
  - Ridmik Keyboard ([Android](https://play.google.com/store/apps/details?id=ridmik.keyboard))
  - [OpenBangla Keyboard](https://github.com/OpenBangla/OpenBangla-Keyboard)
  - [Online Probhat Keyboard](https://mdminhazulhaque.github.io/probhat.im/)
  - [Rokeya Keyboard Layout](https://github.com/MythicAngel/rokeya-keyboard-layout)
  - Borno Keyboard ([Windows](https://codepotro.com/borno), [Android](https://codepotro.com/borno-android))

### Libraries
  - Avro Phonetic Library ([JavaScript](https://github.com/torifat/jsAvroPhonetic), [Go](https://github.com/sadlil/go-avro-phonetic), [C++](https://github.com/mominul/cppAvroPhonetic))
  - [ইউনিভার্সাল কিবোর্ড সফটওয়্যার ‘ইউবোর্ড’ / UBoard](https://bangla.gov.bd/uboard/) [bangla.gov.bd]
  - [jQuery.IME](https://github.com/wikimedia/jquery.ime) - Supports Avro, Probhat, Inscript, National (BD)
  - [BengaliPhoneticParser.swift](https://github.com/OpenBangla/BengaliPhoneticParser.swift) (OpenBangla)
  - [Rupantor](https://github.com/OpenBangla/rupantor-rs) - A very flexible Bengali phonetic parser/converter written in Rust. It also supports Avro Phonetic. (OpenBangla)
  - [bijoy2unicode](https://github.com/Mad-FOX/bijoy2unicode) - A Python package for bidirectional conversion between Bijoy encoding and Unicode Bangla.

### Fixed and Phonetic Input specifications
 - [Bengali input methods](https://en.wikipedia.org/wiki/Bengali_input_methods)

## Corpora (Corpus) and Datasets
 - [Corpus Builder](https://github.com/banglakit/corpus-builder) (Aniruddha Adhikary et al, BanglaKit)
 - [A Language Independent Wikipedia Text Corpus Downloader](https://github.com/Rajan-sust/WikiTextCorpusDownloader)
 - [Indian Language Part-of-Speech Tagset: Bengali (LDC2010T16)](https://catalog.ldc.upenn.edu/LDC2010T16)
 - [IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b (LDC2016S08)](https://catalog.ldc.upenn.edu/LDC2016S08)
 - [BanglaLekha Corpus (Handwriting)](https://data.mendeley.com/datasets/hf6sf8zrkc/2) (ULAB, Dhaka)
 - [BanglaWriting: A multi-purpose offline Bangla handwriting dataset](https://data.mendeley.com/datasets/r43wkvdk4w/1) (BUBT, Dhaka)
 - [Bangla word-list (Bangla Akademy Banan Abhidhan)](https://nltr.itewb.gov.in/download/Bangla_word-list.doc) (SNLTR)
 - [Bangla Speech Corpus](http://downloads.nltr.org/iitkgp-resources/SHRUTI-Bangla_Speech_Corpus.rar) (IIT, Kharagpur)
 - [Bengali Stopwords List](https://github.com/stopwords-iso/stopwords-bn) (stopwords-iso)
 - [Bangla TTS Speech Corpus](http://www.openslr.org/37) (Google)
 - [Large Bengali ASR Dataset](http://www.openslr.org/53) (Google)
 - [Ekush: Bangla Handwritten Characters](https://shahariarrabby.github.io/ekush) ([DIU](http://nlp.daffodilvarsity.edu.bd), Dhaka)
 - [ISHARA-LIPI: Bangla Sign Language Digits and Characters](https://isharalipi.sanzidscloud.com) ([DIU](http://nlp.daffodilvarsity.edu.bd), Dhaka
 - [Bengali Large Commoncrawl Dataset](https://traces1.inria.fr/oscar/)
 - [Bengali Wikipedia Dump Dataset](https://dumps.wikimedia.org/bnwiki/latest/)
 - [Bengali Open Subtitle Parallel Corpus](http://opus.nlpl.eu/)
 - [Bengali-English Translation Dataset](http://www.manythings.org/anki/)
 - [Bengali Female vs Male Names Dataset for NLP Tasks](https://github.com/faruk-ahmad/bengli-female-vs-male-names)
 - [BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis](https://data.mendeley.com/datasets/24xd7w7dhp/1) (CU, Chittagong)
 - [OSCAR: Open Super-large Crawled ALMAnaCH coRpus](https://oscar-corpus.com/)
 - [BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla Handwritten Text Recognition (HTR)](https://data.mendeley.com/datasets/743k6dm543/) (PUC, Chittagong)
 - [Bangla Synthetic License Plates Dataset](https://github.com/zabir-nabil/bangla-synthetic-license-plates) (Zabir Al Nazi)
 - [Bengali Speech Dataset](https://commonvoice.mozilla.org/bn/datasets) (Common Voice, Mozilla)

## NLP Tools, Scripts and Utilities (also Projects)
### NLP Tools
 - [Bangla POS Tagger (HMM/CRF/ME Based)](http://nltr.org/download/iitkgp-resources/Bangla_POS_Tagger_Linux/POS_tagger_Bangla.zip) (IIT, Kharagpur)
 - [Bangla POS Tagger](https://github.com/shm0007/bengali-pos-tagger) (shm0007)
 - [Bangla POS Tagger](https://github.com/uzl/pos_tagger_1) (uzl)
 - [Bangla POS Tagger (XML Based)](https://github.com/sunkuet02/BanglaPosTagger) (sunkuet02)
 - [Bangla POS Tagger (Rule Based)](https://github.com/SharifMAbdullah/Bangla-Parts-of-Speech-Tagger) (Sharif Mohammad Abdullah)
 - [Morphological Analyzer](http://nltr.org/download/iitkgp-resources/Bangla_Morphological_Analyzer/Morph_analyzer.tar) (IIT, Kharagpur)
 - [Chunker (Rule Based)](http://nltr.org/download/iitkgp-resources/Rul_Base_Chunker/chunkerBinary.tgz) (IIT, Kharagpur)
 - [Chunker (Statistical)](http://nltr.org/download/iitkgp-resources/Statistical_Chunker/chunker_v1.1.tar) (IIT, Kharagpur)
 - [Bengali Dependency Parser](https://github.com/saviour-falcon/BengaliDependencyParser) (Rajarshi Das et al)
 - [Bengali Stemmer (Rule Based)](https://github.com/gdebasis/BengaliStemmer) (Debasis Ganguly)
 - [Bengali Stemmer (Rule Based) (.NET)](https://github.com/nayakt/BengaliStemmer_DotNet) (Tapas Nayak)
 - [Bengali Stemmer (Rule Based) (Java)](https://github.com/nayakt/BengaliStemmer_Java) (Tapas Nayak)
 - [Bengali Stemmer (PHP?)](https://github.com/tanveer-preom/BengaliStemmer) (Md. Tanveer Islam, Tanveer Ahmed Nayeem)
 - [Bengali Stemmer (JavaScript)](https://github.com/torifat/bangla-stemmer) (Rifat Nabi)
 - [Bengali Stemmer (Java) (2015)](https://github.com/tazimhoque/Bangla-Stemmer) (Tazim Hoque)
 - [Bengali Stemmer (Java) (2017)](https://github.com/sudiptobd/BanglaDocumentRanking_BanglaStemmer) (Sudipto Roy)
 - [Bengali Word Embedding](https://github.com/smafjal/Bengali-Word-Embedding) (Md. Afjal Hossain)
 - [Bengali Wordnet](https://github.com/soumenganguly/Bengali-Wordnet) (Soumen Ganguly)
 - [Bengali Sentiment Analysis (iPython Notebook)](https://github.com/abhie19/Sentiment-Analysis-Bangla-Language) (Abhishek Singh)
 - [Keyword Extraction](https://github.com/mahirsust/Code300) (Mahir)
 - [Bangla NER](https://github.com/imranulashrafi/banner) (Imranul Ashrafi, Muntasir Mohammad, Arani Shawkat Mauree, Galib Md. Azraf Nijhum, Redwanul Karim, Nabeel Mohammed and Sifat Momen)
 - [Bengali NLP Library(BNLP)](https://github.com/sagorbrur/bnlp) (Sagor)
 - [Emoji to Bengali Text Translation - Python package for NLP](https://github.com/faruk-ahmad/bnemo) (Faruk & Sagor)
 - [Bangla BERT Model](https://huggingface.co/sagorsarker/bangla-bert-base) (Sagor)
 - [Bangla Word2Vec] (https://github.com/menon92/Bangla-Word2Vec) (Mehadi Hasan Menon)
 - [Bangla NLP Toolkit](https://github.com/Foysal87/sbnltk) (Foysal)


### Dictionary
  - [Bengali Lexical Dictionary (2012)](https://github.com/abhishekgupta92/lexical_db_bangla) (Abhishek Gupta)
  - [Bengali Dictionary](https://github.com/MinhasKamal/BengaliDictionary) (Minhas Kamal)
  - [Cross-platform Bengali Dictionary (Go/QML)](https://github.com/monirz/wordgo) (Monir Zaman)

### Bangla Machine Translation
- [Bangla to English Translator](https://github.com/menon92/BanglaTranslator) (Mehadi Hasan Menon)

### OCR/HTR
 - [Borno Bangla OCR](https://ocr.bangla.gov.bd/) (bangla.gov.bd)
 - [Bangla OCR](https://github.com/kmhasan-class/bangla-ocr) (kmhasan)
 - [Bangla OCR](https://sourceforge.net/projects/blp/files/BanglaOCR/) (CRBLP, BRACU)
 - [Bangla OCR](https://github.com/fnazmul/Bengali_OCR/) (Fariha Nazmul)
 - [Bengali Handwritten OCR with Convolutional NN](https://github.com/dibyatanoy/Bengali-Handwritten-Character-Recognition-Using-Convolutional-Neural-Networks) (Dibyatanoy Bhattacharjee)
 - [Numta Handwritten Bengali Digit Classification using Transfer Learning](https://github.com/hasibzunair/unconventional-wisdom) (Hasib Zunair, Nabeel Mohammed, Sifat Momen)
 - [Bengali Digit Recognition](https://github.com/abhinavagarwalla/BengaliDigitRecognition) (Abhinav Agarwalla)
 - [Bengali Digit Classification](https://github.com/smafjal/CNN-Bengali-Digit-Classification-TF) (Md. Afjal Hossain)
 - [BOCRA](https://github.com/deepayan/bocra) [R Package for Bengali OCR]
 - [Bengali OCR with CNN](https://github.com/sanjiv0975/Bengali_OCR) (Sanjiv)
 - [Bengali Handwritten OCR with CNN](https://github.com/bmabir17/bangla_inception) (BM Abir
 - [Synthetic data generation for Bangal OCR](https://github.com/menon92/BanglaText2Image) (Mehadi Hasan Menon)
 - [Line and Word Segmentation for Bangla Handwritten Text Recognition (BN-DRISHTI)](https://github.com/crusnic-corp/BN-DRISHTI) (PUC/CU, Chittagong)

### Speech to Text
 - [voice.bangla.gov.bd](https://voice.bangla.gov.bd/)
 - [Bangla Speech to Text](https://github.com/menon92/BangalASR) (Mehadi Hasan Menon)

### TTS
 - [read.bangla.gov.bd](https://read.bangla.gov.bd/)
 - [Katha - Bangla TTS](https://sourceforge.net/projects/blp/files/Katha_Bangla_TTS/) (CRBLP, BRACU)
 - [Bengali-HTS (HMM-based Bangla TTS)](https://github.com/sankar-mukherjee/Bengali-HTS) (IIT, Kharagpur)
 - Apona Pathok - Bangla TTS (Lost)
 - [bangla-tts (Deep CNN based real-time (GPU) TTS)](https://github.com/zabir-nabil/bangla-tts) (Zabir Al Nazi)

### Multi-modal
 - [CLIP (Contrastive Language–Image Pre-training) implementation for Bangla](https://github.com/zabir-nabil/bangla-CLIP) (Zabir Al Nazi)
 - [Multimodal Hate Speech Detection from Bengali Memes and Texts](https://github.com/rezacsedu/Multimodal-Hate-Bengali) (Rezaul Karim)

### Others
 - [Bengali Spell Checking](https://github.com/AnkurBD/bengali-spellcheck) (Ankur)
 - [Bangla Contextual Spell Checker](https://github.com/MahirMahbub/Contextual-Spell-Checker-For-Bangla) (Mahir Mahbub)
 - [Bagha - Personal Assistant](https://github.com/reyadrahman/Bagha) (Reyad Rahman)
 - [Bangla News Category Classification with Bidirectional LSTM](https://github.com/zabir-nabil/bangla-news-rnn) (Zabir Al Nazi)
 - [Aurthohin - Gibberish Bangla text generator](https://github.com/lifeparticle/Aurthohin)
 - [Bangla Word2Vec Training and Visualization](https://github.com/NuhashHaque/Bangla-Word2Vec-Training-and-Visualization) (Afnan UL Haque Nuhash)
 - [An image search and image-text matching system for Bangla using CLIP](https://github.com/zabir-nabil/bangla-image-search) (Zabir Al Nazi)

## Programming Langauages (?)
 - [Koro (Go in Bangla)](https://github.com/ChimeraCoder/koro)
 - [Potaka](http://www.potaka.io/)
 - [ChaScript](https://github.com/sjishan/chascript) (Syed Tanveer Jishan)
 - [Pakhi](https://github.com/Shafin098/pakhi-bhasha) (Shafin Ashraf)
 - [Pankti](https://github.com/bauripalash/pankti) (Palash Bauri)
 - [Bengali-Alphabet](https://github.com/lifeparticle/Bengali-Alphabet)

## Websites
- [Society for Natural Language Technology Research](http://nltr.org/)
- [Center for Research on Bangla Language Processing, BRACU (Backup Mirror)](http://web.archive.org/web/20150621025544/http://crblp.bracu.ac.bd/)

## Fonts
- [bangla.gov.bd](https://bangla.gov.bd/fonts/)
- [lipighor.com](https://lipighor.com/)