This repository contains the implementation of our methods called HiER, E2H-ISE, and HiER+ presented in our article titled: Highlight Experience Replay and Easy2Hard Curriculum Learning for Boosting Off-Policy Reinforcement Learning Agents.
For more information please visit the project site: http://www.danielhorvath.eu/hier/
The open access article is available here: https://doi.org/10.1109/ACCESS.2024.3427012
The preprint article is available here: https://arxiv.org/abs/2312.09394
The qualitative evaluation: https://youtu.be/si1419HLgwY?si=1idXVp50_QEklwyK
The short video presentation: https://youtu.be/juJOsscVFhs?si=w0u9ui-UH7UV9Rl7
We developed and tested this repository on Ubuntu 20.04
with Python3.8
and CUDA version 11.8
. Nevertheless, it should work on other configurations as well. Even though the packages are relatively common (thus their installation should not be a problem), we provide an instalaltion guide for the aforementioned setup.
The installation as follows.
-
It is recommended to create a virtual environment with python3.8 such as:
python3.8 -m venv rltrain-env
-
Activate the environment:
source rltrain-env/bin/activate
-
Install poetry with pip:
pip install poetry
-
A) Either install your torch version according to https://pytorch.org/ and from the root directory of the repository run:
poetry install --without torchgroup
B) or if our setup (
torch ^2.1.2+cu118
andtorchvision ^0.16.2+cu118
) is compatible with your system then simply run from the root directory of the repository:poetry install
Our contributions are implemented as add-ons to our package called rltrain (a reinforcement learning training framework) which can be found in the rltrain
directory.
To run a single training:
-
The configuration of the training needs to be set in the config file which will be the input of the
main.py
script. As default, it is thecfg_exp/single/config.yaml
file. -
Run the
main.py
script. Inputs are:
--config
: Path of the config file.--hwid
: The id of the GPU.--seednum
: The number of random seeds (they will run after each other).--exppath
: Let it as default, it is only used for parallel training (see later).
In multiple experiment training there are two config files:
-
The
base config file
with the parameters that are the same for all experiments. The parameters which are not the same mindicated with theinput
tag -
The
experiment list config
file, where the different parameters are set.
There are two types of multi training:
-
Serial
: The experiments are run after each other. The script file for the serial mode is:main_serial.py
. -
Parallel
: The experiments are run parallel. The script file for the serial mode is:main_parallel.py
.
To run multiple trainings at once:
-
The configuration of the training needs to be set in the config files which will be the input of the
main_serial.py
ormain_parallel.py
script. As default, they are thecfg_exp/multi/config.yaml
and thecfg_exp/multi/exp_list.yaml
files. -
The
main_serial.py
ormain_parallel.py
script needs to be run. Params:
--config
: Path of the base config file.--explist
: Path of the experiment list config file.--processid
: The id of the process to run from the experiment list config.--hwid
: The id of the GPU.- (only in parallel)
--tempconfig
: Path of the temporary folder storing the exp list formain.py
(Not important, best to let as default).
The framework config files:
-
cfg_framework/config_framework.yaml
: It contains all the available setting and configurations. -
cfg_framework/task_framework.yaml
: It contains all values that are placed into the config file automatically if it is indicated by theauto
tag in the config file.
As default the results of the experiments are placed in the logs/
folder with the name given in the config file. In this folder the there are subfolders for the different seeds (e.g.,logs/Exp_Name/0
, or logs/Exp_Name/1
). In the seed folder there are the config
file and the log
of the experiment and furthermore the runs/
folder with the tensorboard results (also in csv format) and the model_backup/
folder where the model weights are saved.
The experiments can be evaluated and plotted with the scripts in the results/
folder.
The rltrain package has the following structure:
-
agents
: The different RL agents are implemented here (based on the OpenAI Spinningup implementations). -
algos
:-
The HER (hindsight experience replay) implementation can be found at the
rltrain/algos/her
folder. -
The HiER (highlight experience replay) implementation can be found at the
rltrain/algos/hier
folder. -
The E2H-ISE (easy2hard initial state entropy) implementation can be found at the
rltrain/algos/initstate/ise
folder.
-
-
buffers
:-
The standard experiece replay is implemented here.
-
The PER (prioritized experience replay) is implemented here.
-
-
logger
: Handling all the logging, file I/O, and config files. -
runners
: Therltrain/runners/sampler_trainer_tester.py
is as its name suggest the script which runs the training and put together all the components of other folders. It is called by themain.py
,main_serial.py
ormain_parallel.py
. -
taskenv
: The environments are implemented here. -
utils
: This folder contains additional functions for managing the GPUs, evaluation and others.
Please cite as
@article{horvath_hier_2024,
title = {{HiER}: {Highlight} {Experience} {Replay} for {Boosting} {Off}-{Policy} {Reinforcement} {Learning} {Agents}},
volume = {12},
issn = {2169-3536},
shorttitle = {{HiER}},
url = {https://ieeexplore.ieee.org/document/10595054},
doi = {10.1109/ACCESS.2024.3427012},
urldate = {2024-07-26},
journal = {IEEE Access},
author = {Horváth, Dániel and Bujalance Martín, Jesús and Gàbor Erdos, Ferenc and Istenes, Zoltán and Moutarde, Fabien},
year = {2024},
note = {Conference Name: IEEE Access},
keywords = {Training, Robots, robotics, Task analysis, Standards, Reinforcement learning, reinforcement learning, Curriculum learning, Process control, Data collection, experience replay, Random variables, Curriculum development},
pages = {100102--100119},
}