Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP
Section | Sub-Section | Description | Link |
---|---|---|---|
Tokenization | Word Tokenization | Medium Github | |
Tokenization | Sentence Tokenization | Medium Github | |
Part of Speech | Medium Github | ||
Lemmatization | Medium Github | ||
Stemming | Medium Github | ||
Stop Words | Medium Github | ||
Phrase Word Recognition | |||
Spell Checking | Lexicon-based | Peter Norvig algorithm | Medium Github |
Lexicon-based | Symspell | Medium Github | |
Machine Translation | Statistical Machine Translation | Medium | |
Machine Translation | Attention | Medium | |
String Matching | Fuzzywuzzy | Medium Github |
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Pattern-based Recognition | Medium | |||
Lexicon-based Recognition | Medium | |||
Pre-trained NER | Spacy | Medium Github | ||
Custom NER |
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Google Cloud Vision API | Medium | Paper |
Section | Sub-Section | Research Lab | Story | Paper, Year & Code |
---|---|---|---|---|
Generative Pre-Training 2 (GPT-2) | OpenAI | Medium | Paper(2019) Code |
Section | Sub-Section | Description | Link |
---|---|---|---|
Extractive Approach | Medium Github | ||
Abstractive Approach |
Section | Sub-Section | Description | Link | Paper |
---|---|---|---|---|
Euclidean Distance, Cosine Similarity and Jaccard Similarity | Medium Github | |||
Edit Distance | Levenshtein Distance | Medium Github | ||
Word Moving Distance (WMD) | Medium Github | |||
Manhattan LSTM | Medium | Paper |
Section | Sub-Section | Research Lab | Story | Paper & Code |
---|---|---|---|---|
Traditional Method | Bag-of-words (BoW) | Medium Github | ||
Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) | Medium Github | |||
Character Level | Character Embedding | New York University | Medium Github | Paper |
Word Level | Negative Sampling and Hierarchical Softmax | |||
Word2Vec, GloVe, fastText | Medium Github | |||
Contextualized Word Vectors (CoVe) | Salesforce | Medium Github | Paper Code | |
Embeddings from Language Models (ELMo) | AI2 | Medium Github | Paper Code | |
Bidirectional Encoder Representations from Transformers (BERT) | Medium | Paper Code | ||
Generative Pre-Training (GPT) | OpenAI | Medium | Paper Code | |
Contextual String Embeddings | Zalando Research | Medium | Paper Code | |
Self-Governing Neural Networks (SGNN) | Medium | Paper | ||
Multi-Task Deep Neural Networks (MT-DNN) | Microsoft | Medium | Paper | |
Generative Pre-Training-2 (GPT-2) | OpenAI | Medium | Paper(2019) Code | |
Sentence Level | Skip-thoughts | Medium Github | Paper Code | |
InferSent | Medium Github | Paper Code | ||
Quick-Thoughts | Medium | Paper Code | ||
General Purpose Sentence (GenSen) | Medium | Paper Code | ||
Document Level | lda2vec | Medium | Paper | |
doc2vec | Medium Github | Paper |
Section | Sub-Section | Description | Link |
---|---|---|---|
ELI5, LIME and Skater | Medium Github | ||
SHapley Additive exPlanations (SHAP) | Medium Github | ||
Anchors | Medium Github |
Section | Sub-Section | Description | Link |
---|---|---|---|
Using Deep Learning can resolve all problem? | Medium Kaggle |
Section | Sub-Section | Description | Link |
---|---|---|---|
Spellcheck | Github | ||
InferSent | Github |