Run the notebook in Google Colab:
- Theory Reinforcement Learning: overleaf | pdf | book reference
- Exercice Deep Q Learning: google colab
- Reading (chapter 13 til the end of 13.3): RL Intro Policy Optimization
- Exercice Policy Gradient: google colab
Slides: Policy Gradient and RLHF from Page 23
Reading: Secrets of RLHF in Large Language Models Part I: PPO
Reading: Learning to summarize from human feedback
Slides: Adversarial attacks introduction
Ex 2: Fast Gradient Sign Method notebook
You can run the notebook in Google Colab or locally.
If you want to run them locally, you can clone the repository
git clone https://github.com/Swiss-AI-Safety/swiss-summer-camp-23.git
cd swiss-summer-camp-23
conda create --name SAIS python=3.9 -y
conda activate SAIS
conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
pip install -r requirements.txt