-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
SAC Documentation - Benchmarks - Minor code tweaks #146
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/rnPXKTft8t6hLfegqbnNkwBpG7W8 |
… SAC with fixed $\alpha$
Hello there. While I think this branch should be ready for review, it could not pass the pre-commit test due to some problem with the isort....................................................................Passed
autoflake................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1
Traceback (most recent call last):
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
patch_click()
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)
codespell................................................................Passed |
Fixed with psf/black#2964 |
Thank you @dosssman. This is a really high-quality benchmark. I have asked @ikostrikov, who maintains https://github.com/ikostrikov/jaxrl, to help review this PR. Thanks @ikostrikov! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing. Would you mind customizing the chart a bit like the other charts in the benchmark? Use CleanRL's sac_continuous_action.py
instead of exp_name: sac_continuous_action
for the legend. I'd also change the line color to red for consistency.
Everything else looks good :) Feel free to merge once you have a chance to address these .
docs/rl-algorithms/sac.md
Outdated
|
||
## Overview | ||
|
||
The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDPG algorithms > DDPG algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
…lot color changes -- mentions global gradient clipping
|
… added VAE paper citation
Oh man, I finally found the last reference for SAC I was looking for but could not remember: pranz24/pytorch-soft-actor-critic . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really good now. Thank you @dosssman and feel free to merge.
Thanks for the review. Merging then. |
Description
qf_loss
computation: removed the/2
gradient scaling so that .backward() is more aligned with the theory. Instead, logqf_loss
= (qf1_loss + qf2_loss) / 2.` for meaningful comparison with mono Q-value network algorithms.Same is done in OpenAI SpinningUP for example.
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.I have updated the tests accordingly (if applicable).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).I have updated the tests accordingly (if applicable).