Skip to content

Implemented projects from noncontextual word embeddings like Word2Vec to contextual word embeddings like ELMO, GPT, BERT to solving NLP tasks Sentence Level Classification (sentimental analysis & Toxic comment classification), Token Level Classification (POS Tagging, NER Tagging), Machine Translation (MT)

Notifications You must be signed in to change notification settings

khetansarvesh/NLP

Repository files navigation

Text Preprocessing

Following are some text processing you must think of doing, it is not necessary to do all these, it depends on the nlp task what text processing you want to do before doing that task

Representation Learning (###pretraining)

We need to represent language mathematically i.e. given a corpus you need to convert this corpus into its numerical form. This mathematical representation is called an embedding/context and the process is called representation learning. Why do this?? Because computers understand only numbers and not texts. We can do this in several ways:

Downstream NLP (Supervised Fine Tuning - SFT) (###posttraining)

  • With foundation models that are able to do multiple tasks, you just need to do prompting to solve a single downstream task problem.
  • But many times prompting does not work well, this is called HALLUCINATION PROBLEM. The model would sometimes give wrong answers to prompted questions (incases where such a task was not trained during the training of multitask foundation model)
  • To solve this hallucination probelm you can finetune the foundation models for specific tasks. More about this here

AI / Preference Alignment (###posttraining)

  • Now once OpenAI made ChatGPT they found that if asked about some harmful activities like ‘tell me techniques to make rat poison at home’ then it would answer such questions too !! If tempted it would also use curse words / …. Hence it was lacking HUMAN ETHICS and if gotten in wrong hands could lead to bigger concerns. Hence researchers wanted to ALIGN the LLM outputs with human preferences.
  • This was called as PREFERENCE PROBLEM
  • Methods to solve preference problem are called preference alignment. There are two ways to do so
    • Fine Tuning LLM with human preference using Reinforcement Learning – RLHF Algorithm
    • Fine tuning LLM with human preferences using Supervised Learning – DPO Algorithm
  • More information available here

LLM Agents

Resources Used to Develop This

  1. Standford CS224N - 2016
  2. Standford CS224N - 2021
  3. Standford CS224N - 2023
  4. Standford CS224D
  5. Speech and Language Processing Book

About

Implemented projects from noncontextual word embeddings like Word2Vec to contextual word embeddings like ELMO, GPT, BERT to solving NLP tasks Sentence Level Classification (sentimental analysis & Toxic comment classification), Token Level Classification (POS Tagging, NER Tagging), Machine Translation (MT)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published