# Library, Framework, Models, Tools, Services

- [BloomsburyAI's Open Source NLP tool: Cape Webservices - backend server](https://github.com/bloomsburyai/cape-webservices) | [Rest of BloomsburyAI's Open Source NLP tool - Cape](https://www.github.com/bloomsburyai) [Bought out by FB around March/April 2019]
- [NLP Java/JVM](../examples/nlp-java-jvm/README.md#nlp-javajvm) - docker container with Java/JVM based NLP libraries/frameworks (inspired by LaMachine, Awesome NLP and others out there)
- [Better NLP library (experimental)](../examples/better-nlp) | Slides: [1](../examples/better-nlp/presentations/09-Mar-2019/Better-NLP-Presentation-Slides.pdf) [2](../examples/better-nlp/presentations/29-Jun-2019/Better-NLP-2.0-one-library-rules-them-all-Presentation-Slides.pdf)
- [NLP Profiler library here](https://github.com/neomatrix369/nlp_profiler/) - similar to [Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) but for text data
- [Facebook's PyText](https://github.com/facebookresearch/PyText)
- [Facebook's FastText](https://github.com/facebookresearch/FastText) | [homepage | docs](https://fasttext.cc/)
- [Facebook's Pythia](https://code.fb.com/ai-research/pythia/) | [github](https://github.com/facebookresearch/pythia) | [Medium](https://medium.com/syncedreview/facebook-open-sources-pythia-for-vision-and-language-multimodal-ai-models-be480644b538)
- [Flair by Zolando Research](https://www.analyticsvidhya.com/blog/2019/02/flair-nlp-library-python/) | [github](https://github.com/zalandoresearch/flair) | [Research paper](https://drive.google.com/file/d/17yVpFA7MmXaQFTe-HDpZuqw9fJlmzg56/view)
- [Microsoft NLP](https://github.com/microsoft/nlp)
- [Smile - Statistical Machine Intelligence and Learning Engine](https://haifengl.github.io/)
- [Standford NLP Group](https://nlp.stanford.edu/)
- [Google’s Bert](https://github.com/google-research/bert) | [TensorFlow code and pre-trained models for BERT](https://github.com/google-research/bert)
- [Pretraining #BERT with Layer-wise #Adaptive #LearningRates](https://www.linkedin.com/posts/ashishpatel2604_bert-adaptive-learningrates-activity-6610011565313880064-doG7)
- [A very approachable introduction to BERT](https://lnkd.in/dKhNQzJ)
- [H2O Driverless AI](http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/nlp.html) 
   - [Sudalai Rajkumar (SRK), H2O.ai - NLP with H2O Driverless AI - H2O World SF](https://www.youtube.com/watch?v=PJs_2Kyw_RQ&amp;feature=youtu.be)
   - [Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects with Driverless AI](https://www.youtube.com/watch?v=aXPE6IiKRmI&amp;feature=youtu.be)
- [The Illustrated Word2vec](https://jalammar.github.io/illustrated-word2vec/)
- [LDA (topic modelling)](https://github.com/bmabey/pyLDAvis)
- Flashtext: [github](https://github.com/vi3k6i5/flashtext) | [docs](https://buildmedia.readthedocs.org/media/pdf/flashtext/latest/flashtext.pdf) | blogs: [1](https://www.analyticsvidhya.com/blog/2017/11/flashtext-a-library-faster-than-regular-expressions/) o [2](https://medium.freecodecamp.org/regex-was-taking-5-days-flashtext-does-it-in-15-minutes-55f04411025f) o [3](https://medium.com/@Alibaba_Cloud/why-you-should-use-flashtext-instead-of-regex-for-data-analysis-960a0dc96c6a)
- [How to Prepare Text Data for Machine Learning with scikit-learn](https://machinelearningmastery.com/prepare-text-data-machine-learning-scikit-learn/)
- Top libraries
  - [5 Heroic Tools for Natural Language Processing](https://towardsdatascience.com/5-heroic-tools-for-natural-language-processing-7f3c1f8fc9f0) | [Top 5 Python NLP Libraries to Build a Human like Applications](https://www.datasciencelearner.com/top-5-python-nlp-libraries-to-build-a-human-like-application/) | [5 open source tools for taming text](https://opensource.com/business/15/7/five-open-source-nlp-tools) | [Comparing the Functionality of Open Source Natural Language Processing Libraries](https://blog.dominodatalab.com/comparing-the-functionality-of-open-source-natural-language-processing-libraries/)
  - [Comparison of Top 6 Python NLP Libraries](https://www.kdnuggets.com/2018/07/comparison-top-6-python-nlp-libraries.html)
  - [Top 10 Python Libraries for Natural Language Processing (2018)](https://kleiber.me/blog/2018/02/25/top-10-python-nlp-libraries-2018/)
  - Quora: best NLP library question: [1](https://www.quora.com/Which-library-is-best-for-NLP) | [2](https://www.quora.com/What-is-the-best-natural-language-processing-API-library-service-today) | [3](https://www.quora.com/Natural-Language-Processing-What-are-the-best-libraries-for-extracting-data-from-PDF-files)
  - [Python NLP for Hackers](https://nlpforhackers.io/libraries/)
  - [NLP vs NLU vs NLG (Know what you are trying to achieve) NLP engine (Part-1)](https://towardsdatascience.com/nlp-vs-nlu-vs-nlg-know-what-you-are-trying-to-achieve-nlp-engine-part-1-1487a2c8b696)
  - [NLP engine(Part-2) -> Best Text Processing tools or libraries for Natural Language Processing](https://towardsdatascience.com/nlp-engine-part-2-best-text-processing-tools-or-libraries-for-natural-language-processing-c7fd80f456e3)
  - [Natural Language Extraction - Using spaCy on a set of novels](https://medium.com/@rajat.jain1/natural-language-extraction-using-spacy-on-a-set-of-novels-88b159d68686)
  - [A short gist for extracting NER + grouping using emma](https://gist.github.com/svenski/a433a823511a0f9a0941deba93fa0d2f)
  - [Spacy Cheatsheet](https://www.datacamp.com/community/blog/spacy-cheatsheet)
  - [NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks](https://github.com/huggingface/neuralcoref)
  - [StarSpace: Learning embeddings for classification, retrieval and ranking](https://github.com/facebookresearch/StarSpace)
  - [Entity Embeddings for Categorical Variables](https://www.linkedin.com/posts/abhi1thakur_machinelearning-deeplearning-python-activity-6645663647869779968-Hj92)
  - [𝗕𝗹𝗼𝗴𝗽𝗼𝘀𝘁 Acoustic Word Embeddings - Introduction to the field of acoustic word embeddings (AWEs) for those with a background in speech processing, NLP, or DL/ML](https://www.linkedin.com/posts/philipvollet_nlp-deeplearning-naturallanguageprocessing-activity-6642321964998381568-EsmD)
  - [Free NLP service - Natural Language for Developers](https://wit.ai)
  - [Do you want to convert English to Integers/Floats?](https://www.linkedin.com/posts/amrrs_python-nlg-nlp-activity-6620696848455831552-qTvG)
  - [How do we implement large-scale #NLP models on IPU](https://www.linkedin.com/posts/graphcore_arianna-saracino-product-support-engineer-activity-6615949485463920640-7Pwa)
  - [NLP in fraud dectection: Case study](https://www.linkedin.com/posts/data-science-central_natural-language-understanding-nlu-in-fraud-activity-6623003404279005184-Fg_L)
  - TextAttack – a really cool Python framework for attacking NLP models and augmenting text datasets: [GitHub](https://github.com/QData/TextAttack/) | [Tweet](https://twitter.com/lavanyaai/status/1260384065481392129)
  - Albumentations package for NLP data augmentation: [Kaggle Kernel 1](https://www.kaggle.com/shonenkov/tpu-training-super-fast-xlmroberta) | [Kaggle Kernel 2](https://www.kaggle.com/shonenkov/nlp-albumentations)
  - [#BLINK - an Entity Linking python library](https://www.linkedin.com/posts/inna-vogel-nlp_blink-neuralnetworks-deeplearning-activity-6671381112528404480-YD8r)
- [#WhatLies is a python library visualizing word embeddings as well as operations on them](https://www.linkedin.com/posts/inna-vogel-nlp_whatlies-activity-6661164569232187392-bjZ9) | [Whatlies - a toolkit to help visualise what lies in word embeddings. Make visualisation easier of both word embeddings as well as operations on them.](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-datascience-activity-6704264100802822144-4TV5)
- [Monet is an open-source Python package for analyzing and integrating scRNA-Seq data using PCA-based latent spaces.](https://www.linkedin.com/posts/philipvollet_pca-machineleraning-datascience-activity-6680309874238267392-0sIS)
- [Ores library](https://ores.wikimedia.org/](https://stats.wikimedia.org/)
- [PictureText: Interactive visuals of text using SBERT, Hierarchical Clustering and tree maps for corpus understanding! ](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-datascience-activity-6721388188067356673-1ExN)
- [NLP libraries: DeText and HayStack](https://www.linkedin.com/posts/madewithml_machinelearning-artificialintelligence-madewithml-activity-6695695292286803968-NSzs)
- [DeText: NLP framework from LinkedIn](https://github.com/linkedin/detext)
- [DeText is a Deep Text understanding framework for NLP related ranking, classification, and language generation tasks. I](https://www.linkedin.com/posts/philipvollet_nlp-datascience-machinelearning-activity-6701715000941268992-m54k)
- [Text dataset can be preprocessed to detext acceptable format.](https://github.com/linkedin/detext/blob/master/TRAINING.md)
- [Hugginface based libraries: FARM, etc...](https://www.linkedin.com/posts/madewithml_farm-framework-for-adapting-representation-activity-6696240595029114880-xxJg)
- [TextAttack is a library for running adversarial attacks against natural language processing (NLP) models. TextAttack builds attacks from four components: a search method, goal function, transformation, and a set of constraints.](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-reverseengineering-activity-6682883469128863744-OaX-)
- Wild NLP: text data augmentation: [pypi](https://pypi.org/project/wild-nlp/)
- [token2index: A lightweight but powerful library for token indexing • For NLP tasks, compatible with major Deep Learning frameworks like PyTorch and Tensorflow](https://www.linkedin.com/posts/philipvollet_nlp-datascience-machinlearning-activity-6687634609661988864-Bdjv)
- [spellCheck • Contextual word checker for better suggestions](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-science-activity-6692895449881419776-rqV-)
- [PBoS: Probabilistic Bag-of-Subwords for Generalizing Word Embedding ](https://www.linkedin.com/posts/philipvollet_nlp-datascience-machinelearning-activity-6725293360724230144-MhQ4)
- [iNLTK - Natural Language Toolkit for Indic Languages. Features: Data Augmentation, Sentence Similarity, Sentence Encoding, Word Embedding, Tokenization and Text Generation utilities for low resource 12 Indic](https://www.linkedin.com/posts/philipvollet_machinelearning-datascience-nlp-activity-6698220942910468096-phA-)
- [Camelot: PDF Table Extraction for Humans — A Python library that makes it easy for anyone to extract tables from PDF files!](https://www.linkedin.com/posts/philipvollet_python-pdf-datascience-activity-6699016329502056448-uYqP)
- [T5: Text-To-Text Transfer Transformer: Understanding Transformer-Based Self-Supervised Architectures](https://www.linkedin.com/posts/philipvollet_machinelearning-nlp-artificialintelligence-activity-6695535478218784768-Z5KJ)
- [spaCy Streamlit App • Text analysis in your browser with features like: Name Entity Recognition, Text Classification, Similarity Tokens, etc.](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-python-activity-6681170955236126720-FMDK)
- [Haystack - Scalable Neural Search & Question Answering for text documents. It's highly modular and offers several options for fitting into your tech stack and use cases](https://www.linkedin.com/posts/philipvollet_artificialintelligence-datascience-machinelearning-activity-6694820649032142848-wUtC)
- [Stanza extended with first domain-specific #NLP models for biomedical and clinical medical English.](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-innovation-activity-6694680265547239425-yd5F)
- [NeMo is a toolkit for creating Conversational AI applications. ](https://www.linkedin.com/posts/philipvollet_nlp-deeplearning-datascience-activity-6706074461495529472-wgt1)
- [LIT: Language Interpretability Tool](https://www.linkedin.com/posts/inna-vogel-nlp_google-open-sources-lit-a-toolset-for-evaluating-activity-6703590476622123008-AhK-) | [#Google open-sourced the toolset LIT for evaluating, visualizing, understanding, and auditing #NLP models.](https://www.linkedin.com/posts/inna-vogel-nlp_google-open-sources-lit-a-toolset-for-evaluating-activity-6703590476622123008-AhK-)
- [HMNI • Fuzzy Name Matching with Machine Learning. Perform common fuzzy name matching tasks including similarity scoring, record linkage, deduplication and normalization](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-datascience-activity-6694666551989297152-EGtw)
- [Jina is an AI-powered search framework, empowering developers to create cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud](https://github.com/jina-ai/jina)
- [Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API](https://www.linkedin.com/posts/philipvollet_python-data-twitter-activity-6713352923373551616-GNlu)
- [ProphetNet: MS's transformer variant](https://github.com/microsoft/ProphetNet)
- [Rouge](https://pypi.org/project/rouge/)
- [Pointer Generator](https://github.com/AIKevin/Pointer_Generator_Summarizer)
- [Textract](https://textract.readthedocs.io/en/stable/python_package.html)
- [Text Rank](https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/)
- [TextHero](https://texthero.org/)
- [Facebook Research • TaBERT a pre-trained language model for learning joint representations of natural language utterances and structured tables for semantic parsing](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-technology-activity-66861)
- [SpikeX - SpaCy Pipes for Knowledge Extraction](https://www.linkedin.com/posts/philipvollet_nlp-machinelearning-datascience-activity-6790492432502026240-wVxj)
- [Obsei: Observe SEgment and Inform - A workflow automation tool for text segmentation](https://www.linkedin.com/posts/philipvollet_datascience-nlp-machinelearning-activity-6748875008870875136-hQoh)
- (X) Cross-Lingual Transfer Evaluation of Multilingual Encoders: 
[site](https://sites.research.google/xtreme) | [github](https://github.com/google-research/xtreme)
- [VecMap (cross-lingual word embedding mappings)](https://github.com/artetxem/vecmap)
- [PyCaret: Natural Language Processing Module](https://pycaret.org/nlp/)
- [NLPretext - a python with all the text preprocessing functions you need to ease your NLP project](https://www.linkedin.com/posts/kalyankatikapallisubramanyam_nlproc-nlp-machinelearning-activity-6821345994182070272-2La4)
- [New release: Gramformer a framework for detecting, highlighting and correcting grammatical errors on natural language text](https://www.linkedin.com/posts/philipvollet_datascience-machinelearning-nlp-activity-6819547377955930112-mTye)
- [Machine Learning and Deep Learning: EN-JP Lexicon](https://github.com/Machine-Learning-Tokyo/EN-JP-ML-Lexicon)
- [MALLET is a Java-based package for statistical natural language processing, d](https://github.com/mimno/Mallet)
- [Crowlingo Multilingual NLP](https://www.dataiku.com/product/plugins/crowlingo-nlp/)
- [An adaptable platform for text analytics and discovery](http://www.rosette.com/)

# Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the [CONTRIBUTING](../CONTRIBUTING.md) guidelines, also have a read about our [licensing](../LICENSE.md) policy.

---

Back to [NLP page (table of contents)](README.md)</br>
Back to [main page (table of contents)](../README.md)
