Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms
@INPROCEEDINGS{dagdanov2023self,
author={Dagdanov, Resul and Durmus, Halil and Ure, Nazim Kemal},
booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
title={Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms},
year={2023},
volume={},
number={},
pages={5631-5637},
doi={10.1109/ICRA48891.2023.10160883}
}
- Citation
- Installation
- Tests
- Train Reinforcement Learning Agent
- Tune Reward Function
- Evaluation
- Verification Algorithms
- Self Improvement
- Analyse Results
- Sbatch Slurm
save this directory to .bashrc
gedit ~/.bashrc
paste and save the following to .bashrc file
export BLACK_BOX="LocalPathOfThisRepository"
execure saved changes
source ~/.bashrc
used python3.7
conda create -n highway python=3.7.13
conda activate highway
install required packages
pip install -r requirements.txt
In order to successfully use GPUs, please install CUDA by following the site : https://pytorch.org/get-started/locally/
- Trained and tested the repository with the following versions:
- Python -> 3.7.13
- Pytorch -> 1.11.0
- Ray -> 2.0.0
- Gym -> 0.22.0
prepare Ubuntu
sudo apt-get update -y
sudo apt-get install -y python-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev libsdl1.2-dev libsmpeg-dev
python-numpy subversion libportmidi-dev ffmpeg libswscale-dev libavformat-dev libavcodec-dev libfreetype6-dev gcc
accept license for additional gym alike games, otherwise this can cause error in highway-environment installation
pip install autorom-accept-rom-license==0.4.2
install highway-environment
pip install highway-env==1.5
install custom highway environment globally
cd highway_environment
python setup.py install
NOTE: make sure that after each update of Environment class, environment installation has to be performed again
register custom environment wrapper class
cd highway_environment/highway_environment
python create_env.py
install Ray + dependencies for Ray Tune
pip install -U "ray[tune]"==2.0.0
install Ray + dependencies for Ray RLlib
pip install -U "ray[rllib]"==2.0.0
run default PPO training example with ray.tune
cd highway_environment/highway_environment
python test_train.py
NOTE:
- parameters of a trained model will be saved at /experiments/results/trained_models folder
- please specify training-iteration parameter inside /experiments/configs/train_config.yaml config for how many iteration to train model
- training model parameters could be changed from /experiments/configs/ppo_train.yaml for PPO or /experiments/configs/sac_train.yaml for SAC algorithms
cd experiments/training
python ppo_train.py
cd experiments/training
python sac_train.py
NOTE:
- custom reward function for RL agent training is calculated in /highway_environment/highway_environment/envs/environment.py as compute_reward()
- energy weights of the function is computed by analysing real driving scenarios
- grid search algorith is applied to find weight multipliers of the function that maximizes reward obtained in real driving scenarios
- tuning logs and results are saved in /experiments/results/tuning_reward_function/ folder
- currently Eatron driving dataset is used for tuning
- before tuning, please take a look at /experiments/configs/reward_tuning.yaml configuration file
cd experiments/utils
python reward_tuning.py
NOTE:
- parameters of a trained model should be moved to /experiments/results/trained_models/ folder from ~/ray_results/ folder
- please check load-agent-name key inside /experiments/configs/evaluation_config.yaml config to be the model intended to evaluate
- consider initial-space key in the same config yaml that represents the initial conditions of the front vehicle while evaluation
cd experiments/evaluation
python evaluate_model.py
NOTE:
- EGO vehicle could be set as an IDM vehicle
- controlling actions of an EGO vehicle will be taken by IDM vehicle
- set controlled-vehicles key inside /experiments/configs/env_config.yaml to 0 (zero)
apply grid-search algorithm verification on a trained rl model
NOTE:
- check load-agent-name key inside /experiments/configs/grid_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder
cd experiments/algorithms
python grid_search.py
apply monte-carlo-search algorithm verification on a trained rl model
NOTE:
- check load-agent-name key inside /experiments/configs/monte_carlo_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder
cd experiments/algorithms
python monte_carlo_search.py
apply cross-entropy-search algorithm verification on a trained rl model
NOTE:
- check load-agent-name key inside /experiments/configs/ce_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder
- check number-of-samples key inside /experiments/configs/ce_search.yaml config is defined as a multiplication of iteration number and sample size per iteration. At each iteration best 10 percent will be selected from batch of sample size to determine next iteration's minimum and maximum limits.
cd experiments/algorithms
python ce_search.py
apply bayesian-optimization-search algorithm verification on a trained rl model
install package
pip install bayesian-optimization==1.4.0
NOTE:
- check load-agent-name key inside /experiments/configs/bayesian_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder
cd experiments/algorithms
python bayesian_search.py
apply adaptive-multilevel-splitting-search algorithm verification on a trained rl model
NOTE:
- check load-agent-name key inside /experiments/configs/ams_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder
cd experiments/algorithms
python ams_search.py
after applying verification algorithm, RL agent could be trained again on validation results
NOTE:
- use /experiments/configs/self_improvement.yaml config
- train model with /experiments/training/self_improvement.py script
- trained model could be loaded and re-trained from latest checkpoint with is-restore key inside config
- custom scenario setter class is located at /experiments/utils/scenarios.py
- new scenario loader could be added and referenced with key validation-type in config self_improvement.yaml config
cd experiments/training
python self_improvement.py
to include specific verification results into sampling container, read the following note
NOTE:
- change validation-type key inside /experiments/configs/self_improvement.yaml config to "complex"
- take a look at scenario-mixer key parameters and specify which validation results to include
- each validation comes with probability of sampling which should sum up to 1.0
- folder names in scenario-mixer key should be null if not specified and total sum percentage-probability of existing folders should be 100 (1.0)
after training and running a verification algorithm, visualize validation and failure scenarios
cd experiments/analyses
python3 -m notebook
submit a batch script to slurm for training an RL model
cd experiments/training
conda activate highway
# checkout resource allocations before submitting a slurm batch
sbatch slurm_train.sh
submit a batch script to slurm for applying selected verification algorithm
cd experiments/algorithms
conda activate highway
# checkout selected algorithm and resource allocations before submitting a slurm batch
sbatch slurm_verification.sh
basic slurm commands
# submit a batch script to Slurm for processing
sbatch <job-id>
# show information about your job(s) in the queue
squeue
# show information about current and previous jobs
sacct
# end or cancel a queued job
scancel <job-id>
# read last lines of terminal logs (.err or .out)
tail -f <job-id>.out