Added a few features to make rl_games more comparable to the brax with jax ppo #314

Denys88 · 2024-11-28T21:49:14Z

Added Squashed tanh distribution. They use it in brax ppo. Didn't work great for me
Added ability to return balanced results during evaluation. Unfortunately BRAX doesn't do resets with randomization.
By default it just stores first observation and returns to it. It means if we have one env with borken start position where we always get done in a few step it overflows results. Need to use 'balance_env_rewards' in evaluation to get the true scores.
Added 'epochs_between_resets' which is by default zero. Could be used with BRAX because they don't randomize position for envs during training in reset.
It means that training rewards are much lower in rl-games than real one.
Added one more example of the Simple Neural network called SimpleNet. If you use it you can get a little bit better performance. And overall it is a good example:

Works only for continuous.
[512, 256, 128] hardcoded layers single network for mu and val and independent layer for std.
torch.compile.
It didn't add a lot of perf in my case but nice to have.

DenSumy added 2 commits November 28, 2024 13:43

Updated

7e0d74d

cleanup and simplenet

15bd59a

Denys88 merged commit 6612eaf into master Nov 30, 2024

Denys88 deleted the DM/update_for_brax branch November 30, 2024 21:12

Provide feedback