DiverseRL is a repository that aims to implement and benchmark reinforcement learning algorithms.
This repo aims to implement algorithms of various sub-topics in RL (e.g. model-based RL, offline RL), in wide variety of environments.
- Wandb logging
- Tensorboard
You can install the requirements by using Poetry.
git clone https://github.com/moripiri/DiverseRL.git
cd DiverseRL
poetry install
Currently, the following algorithms are available.
Model-free Deep RL algorithms are set of algorithms that can train environments with state-based observations without model.
- DQN (Deep Q-Network)
- DDPG (Deep Deterministic Policy Gradients)
- TD3 (Twin Deep Delayed Deterministic Policy Gradients)
- SAC (Soft Actor Critic)
- PPO (Proximal Policy Gradients)
Pixel RL contains set of algorithms that are set to train environments with high-dimensional images as observations, Such as Atari 2600 and dm-control.
- DQN (Deep Q-Network) (for Atari 2600)
- SAC (Soft Actor Critic) (for dm-control)
- PPO (Proximal Policy Gradients) (for Atari 2600)
- SAC-AE (Soft Actor Critic with Autoencoders) (for dm-control)
- CURL (Contrastive Unsupervised Representations for Reinforcement Learning) (for dm-control)
- RAD (Reinforcement Learning with Augmented Data) (for dm-control)
- DrQ (Data-Regularized Q) (for dm-control)
- DrQ-v2 (Data-Regularized Q v2) (for dm-control)
Classic RL algorithms that are mostly known by Sutton's Reinforcement Learning: An Introduction. Can be trained in Gymnasium's toy text environments.
- SARSA
- Q-learning
- Model-free Monte-Carlo Control
- Dyna-Q
Training requires two gymnasium environments(for training and evaluation), algorithm, and trainer.
from diverserl.algos import DQN
from diverserl.trainers import DeepRLTrainer
from diverserl.common.utils import make_envs
env, eval_env = make_envs(env_id='CartPole-v1')
algo = DQN(env=env, eval_env=eval_env, **config)
trainer = DeepRLTrainer(
algo=algo,
**config
)
trainer.run()
Or you use hydra by running run.py
.
python run.py env=gym_classic_control algo=dqn trainer=deeprl_trainer algo.device=cuda trainer.max_step=10000