Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

SAC Documentation - Benchmarks - Minor code tweaks #146

Merged
merged 15 commits into from
Apr 9, 2022

Conversation

dosssman
Copy link
Collaborator

@dosssman dosssman commented Mar 23, 2022

Description

  • SAC documentation prototype
  • SAC qf_loss computation: removed the /2 gradient scaling so that .backward() is more aligned with the theory. Instead, log qf_loss = (qf1_loss + qf2_loss) / 2.` for meaningful comparison with mono Q-value network algorithms.
    Same is done in OpenAI SpinningUP for example.
  • Added benchmark instructions to run SAC on Mujoco and pybullet environments
  • Added SAC runs for 6 continuous control envs (3 Mujoco, 3 PyBullet) to cleanrl and corresponding openrlbenchmark reports.

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation
  • Benchmarks

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented Mar 23, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/rnPXKTft8t6hLfegqbnNkwBpG7W8
✅ Preview: https://cleanrl-git-fork-dosssman-sac-docs-vwxyzjn.vercel.app

@gitpod-io
Copy link

gitpod-io bot commented Mar 23, 2022

@dosssman
Copy link
Collaborator Author

Hello there.

While I think this branch should be ready for review, it could not pass the pre-commit test due to some problem with the black dependency I think. @vwxyzjn Have you encountered such a problem recently ?

isort....................................................................Passed
autoflake................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1

Traceback (most recent call last):
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/bin/black", line 8, in <module>
    sys.exit(patched_main())
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
    patch_click()
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
    from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)

codespell................................................................Passed

@dosssman dosssman marked this pull request as ready for review March 29, 2022 14:08
@dosssman dosssman requested a review from vwxyzjn March 29, 2022 14:08
@vwxyzjn
Copy link
Owner

vwxyzjn commented Mar 29, 2022

Looks good on my end:
image

@vwxyzjn
Copy link
Owner

vwxyzjn commented Mar 29, 2022

Fixed with psf/black#2964

@vwxyzjn
Copy link
Owner

vwxyzjn commented Mar 29, 2022

Thank you @dosssman. This is a really high-quality benchmark. I have asked @ikostrikov, who maintains https://github.com/ikostrikov/jaxrl, to help review this PR. Thanks @ikostrikov!

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing. Would you mind customizing the chart a bit like the other charts in the benchmark? Use CleanRL's sac_continuous_action.py instead of exp_name: sac_continuous_action for the legend. I'd also change the line color to red for consistency.

Everything else looks good :) Feel free to merge once you have a chance to address these .


## Overview

The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DDPG algorithms > DDPG algorithm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

docs/rl-algorithms/sac.md Show resolved Hide resolved
…lot color changes -- mentions global gradient clipping
@dosssman
Copy link
Collaborator Author

dosssman commented Apr 5, 2022

  • Change the legend of the SAC experiments
  • Change the color of the SAC experiments in the standalone Mujoco and PyBullet reports
  • Re-export the SAC plots with the red color
  • Fixed the "DDPG algorihtms" typo
  • Mention global gradient clipping

@dosssman
Copy link
Collaborator Author

dosssman commented Apr 8, 2022

Oh man, I finally found the last reference for SAC I was looking for but could not remember: pranz24/pytorch-soft-actor-critic .

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really good now. Thank you @dosssman and feel free to merge.

@dosssman
Copy link
Collaborator Author

dosssman commented Apr 9, 2022

Thanks for the review. Merging then.

@dosssman dosssman merged commit 9428ce6 into vwxyzjn:master Apr 9, 2022
@dosssman dosssman deleted the sac-docs branch April 9, 2022 01:36
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants