Overview

In this tutorial we are interested in reproducible reinforcement learning research. The experiments in this repository aim to reproduce some deep reinforcement learning results from the paper Learning Value Functions in Deep Policy Gradients using Residual Variance. To do so we use a specific experimental protocole and open source libraries that we introduce next.

Empirical reinforcement learning research.

Reinforcement learning (RL) suffers a lot from a lack of reproducibility. Hidden implementation details, and small number of seeds or different machines, hinders advances in the field. In this paper, authors recommend an experimental protocole that ensures thorough comparison of agents, say PPO vs SAC. By following such a standardized protocole, RL research can move forward faster and safer: if you claim your agent is new state-of-the-art on some benchmarks, following such protocole could be a validity stamp in your paper!

What agent to train?

Implementation

We recommend to use the well-maintained, well-documented, and stable, agent implementations from stable-baselines3. We implement AVEC-PPO from Learning Value Functions in Deep Policy Gradients using Residual Variance by overwriting the train() method from the base PPO agent.

Instantiation

One needs to find the best instantiation of each agent to compare. This is done with hyperparameter optimization. For each set of hyperparameters, each agent is trained 3 times. For each agent, the set of hyperparameters giving the best score averaged on runs is kept for the actual comparison. The hyperparameters optimization is done with simple python nested loops Add link.

How to get training data?

We give an overview of the protocole behind the training code

Training and saving

Briefly, rlberry provides tool to evaluate agents. In particular it handles running multiple seeds of agent instantiations and saves training data. One can do so with the ExperimentManager class by feeding it a seed, an agent class, the agent hyperpameters found in the hyperparameters optimization phase, number of training steps and number of runs (This paper paper recommends 15 runs but we will see later how to choose the number of runs adaptively to minimize it).

Seeding

Seeding is a key component of reproducible research, beyond RL. By fixing seeds, one can ensure that the stochastic process in the empirical protocole will give the same results, e.g. same neural network weights, same environment starting states and so on. However each machine learning libraries have their own way to do seeding. For example, RL environments from gymnasium use the seeding from numpy and actor and critic neural nets from stable-baselines3 use the seeding from torch. The seeding of both environements and agents is handled automatically by the ExperimentManager from rlberry!

How to present and compare training results?

Insights

Statitiscally significant comparisons.

Usage

Tested on Python 3.10

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cd empirical_rl

Training

python3 training.py

Plotting

python3 plotting.py

Evaluating

python3 evaluating.py

Adastop (long)

python3 statistical_comparing.py

Expected Results

Adastop expected results

[INFO] 13:10: Test finished 
[INFO] 13:10: Results are  
          Agent1 vs Agent2  mean Agent1  mean Agent2  mean diff decisions
0  default_ppo vs avec_ppo      -86.636    -118.6952    32.0592     equal

TODOs

Ant-v4
Loop over hyperparams and expand boundaris (hyperparam optim as per Patterson 2023)
Docstrings ?
Fix bug data loading for plotting data.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
empirical_rl		empirical_rl
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Empirical reinforcement learning research.

What agent to train?

Implementation

Instantiation

How to get training data?

Training and saving

Seeding

How to present and compare training results?

Insights

Statitiscally significant comparisons.

Usage

Training

Plotting

Evaluating

Adastop (long)

Expected Results

Adastop expected results

TODOs

About

Releases

Packages

Contributors 3

Languages

License

rlberry-py/template_rl_experiment

Folders and files

Latest commit

History

Repository files navigation

Overview

Empirical reinforcement learning research.

What agent to train?

Implementation

Instantiation

How to get training data?

Training and saving

Seeding

How to present and compare training results?

Insights

Statitiscally significant comparisons.

Usage

Training

Plotting

Evaluating

Adastop (long)

Expected Results

Adastop expected results

TODOs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages