Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Added a few features to make rl_games more comparable to the brax with jax ppo #314

Merged
merged 2 commits into from
Nov 30, 2024

Conversation

Denys88
Copy link
Owner

@Denys88 Denys88 commented Nov 28, 2024

  • Added Squashed tanh distribution. They use it in brax ppo. Didn't work great for me
  • Added ability to return balanced results during evaluation. Unfortunately BRAX doesn't do resets with randomization.
  • By default it just stores first observation and returns to it. It means if we have one env with borken start position where we always get done in a few step it overflows results. Need to use 'balance_env_rewards' in evaluation to get the true scores.
  • Added 'epochs_between_resets' which is by default zero. Could be used with BRAX because they don't randomize position for envs during training in reset.
  • It means that training rewards are much lower in rl-games than real one.
  • Added one more example of the Simple Neural network called SimpleNet. If you use it you can get a little bit better performance. And overall it is a good example:
  1. Works only for continuous.
  2. [512, 256, 128] hardcoded layers single network for mu and val and independent layer for std.
  3. torch.compile.
    It didn't add a lot of perf in my case but nice to have.

@Denys88 Denys88 merged commit 6612eaf into master Nov 30, 2024
@Denys88 Denys88 deleted the DM/update_for_brax branch November 30, 2024 21:12
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants