UNREAL paper https://arxiv.org/pdf/1611.05397.pdf
Implementation of three Auxiliary Tasks: Pixel Cotrol, Value Function Replay, Reward Prediction built on top of advantage actor-critic
In plots we see that it gives some speed up on learning easy DoomBasic-v0 environment
To see plots with tensorboard run: tensorboard --logdir=logs