- Week 1 - Embeddings
- Fun with Word Embeddings
- Tokenization
- Word2Vec embeddings
- GloVe embeddings
- Visualizing word vectors using PCA and t-SNE
- Finding similar questions
- Multilingual Embedding-based Machine Translation
- Word2Vec embeddings
- Embedding space mapping + orthogonal Procrustean problem
- Improving multilingual embedding-based MT with FastText embeddings
- Fun with Word Embeddings
- Week 2 - Classification
- Large scale text analysis with deep learning
- Prohibited Comment Classification
Tackle this problem using both classical NLP methods and embedding-based approach:- BOW from scratch implementation
- TF-IDF from scratch implementation
- Naive Bayes from scratch
- TF-IDF + Logistic Regression + Hyperparameter Grid Search
- FastText embeddings + Logistic Regression + Hyperparameter Grid Search
- Salary prediction
The task is to predict salary based on the different text and categorical features:- Exploratory Data Analysis
- Categorical Columns Encoding
- Target transformation
- Exploratory Data Analysis
- Modeling:
- Baseline: Custom PyTorch dataset + Custom Transforms + Fusion model (Title Encoder + Description Encoder + Categorical Encoder )
- Improved model: In progress
- Explaining model predictions: In progress
- Week 3 - Language Modeling
- Building n-gram language model using titles and summaries from ArXiv articles
- Sampling with temperature.
- Language Model evaluation: Perplexity.
- Language Model smoothing: Laplace.
- Neural left-to-right LMs:
- Preparing dataset for training (building char-level vocabulary)
- FixedWindowLanguageModel using CNN (training, evaluation, generation)
- Cross Entropy Categorical Loss implementation
- RNN LanguageModel using LSTM (training, evaluation, generation)
- Implemented Nucleous sampling
- Building n-gram language model using titles and summaries from ArXiv articles
- Week 4 - Seq2Seq
- week 2 (Text Classification):
- Practice:
- Homework part 2 - in progress
- Theory:
- Analysis and Interpretability
- Research Thinking
- Related Papers
- Practice:
- week 3 (Language Modeling):
- Practice:
- Seminar - Fix Kneser-Ney smoothing
Look here at the bottom of the page for reference formula - Homework - Implement Beam Search + Ultimate LM
- Seminar - Fix Kneser-Ney smoothing
- Practice:
- week 4 (Seq2Seq)