SAC Documentation - Benchmarks - Minor code tweaks #146

dosssman · 2022-03-23T05:12:08Z

Description

SAC documentation prototype
SAC qf_loss computation: removed the /2 gradient scaling so that .backward() is more aligned with the theory. Instead, log qf_loss = (qf1_loss + qf2_loss) / 2.` for meaningful comparison with mono Q-value network algorithms.
Same is done in OpenAI SpinningUP for example.
Added benchmark instructions to run SAC on Mujoco and pybullet environments
Added SAC runs for 6 continuous control envs (3 Mujoco, 3 PyBullet) to cleanrl and corresponding openrlbenchmark reports.

Types of changes

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
~~I have updated the tests accordingly (if applicable).~~

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-03-23T05:12:12Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/rnPXKTft8t6hLfegqbnNkwBpG7W8
✅ Preview: https://cleanrl-git-fork-dosssman-sac-docs-vwxyzjn.vercel.app

gitpod-io · 2022-03-23T05:12:16Z

… SAC with fixed $\alpha$

dosssman · 2022-03-29T14:08:33Z

Hello there.

While I think this branch should be ready for review, it could not pass the pre-commit test due to some problem with the black dependency I think. @vwxyzjn Have you encountered such a problem recently ?

isort....................................................................Passed
autoflake................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1

Traceback (most recent call last):
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/bin/black", line 8, in <module>
    sys.exit(patched_main())
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
    patch_click()
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
    from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)

codespell................................................................Passed

vwxyzjn · 2022-03-29T15:01:48Z

Looks good on my end:

psf/black#2964

vwxyzjn · 2022-03-29T15:14:19Z

Fixed with psf/black#2964

vwxyzjn · 2022-03-29T20:44:13Z

Thank you @dosssman. This is a really high-quality benchmark. I have asked @ikostrikov, who maintains https://github.com/ikostrikov/jaxrl, to help review this PR. Thanks @ikostrikov!

vwxyzjn

One other thing. Would you mind customizing the chart a bit like the other charts in the benchmark? Use CleanRL's sac_continuous_action.py instead of exp_name: sac_continuous_action for the legend. I'd also change the line color to red for consistency.

Everything else looks good :) Feel free to merge once you have a chance to address these .

vwxyzjn · 2022-04-04T20:16:35Z

docs/rl-algorithms/sac.md

+
+## Overview
+
+The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies.


DDPG algorithms > DDPG algorithm

Thanks. Fixed.

docs/rl-algorithms/sac.md

…lot color changes -- mentions global gradient clipping

dosssman · 2022-04-05T03:46:35Z

Change the legend of the SAC experiments
Change the color of the SAC experiments in the standalone Mujoco and PyBullet reports
Re-export the SAC plots with the red color
Fixed the "DDPG algorihtms" typo
Mention global gradient clipping

… added VAE paper citation

dosssman · 2022-04-08T06:53:35Z

Oh man, I finally found the last reference for SAC I was looking for but could not remember: pranz24/pytorch-soft-actor-critic .

vwxyzjn

It looks really good now. Thank you @dosssman and feel free to merge.

dosssman · 2022-04-09T01:36:35Z

Thanks for the review. Merging then.

dosssman added 2 commits March 23, 2022 13:25

Preliminary work on the SAC docs and clenarl openbenchmark

b721759

Updated the instructions for benchmark script runs

8b88bd4

vercel bot deployed to Preview March 23, 2022 05:12 View deployment

Fixed typos and formatting

533a051

vercel bot deployed to Preview March 23, 2022 05:15 View deployment

SAC docs added Wandb Iframe for PyBullet and MuJoCo; command line for…

69f75ce

… SAC with fixed $\alpha$

vercel bot deployed to Preview March 23, 2022 05:21 View deployment

vwxyzjn mentioned this pull request Mar 23, 2022

Refactor documentation #121

Closed

10 tasks

dosssman added 2 commits March 29, 2022 22:45

Finalized complete, sac.md doc draft, added images of learning curves

a7a6fc8

Typo and formulation tweaks

94b5301

vercel bot deployed to Preview March 29, 2022 13:50 View deployment

Fixed the autospell detected typo

c763e18

vercel bot deployed to Preview March 29, 2022 14:08 View deployment

dosssman marked this pull request as ready for review March 29, 2022 14:08

dosssman requested a review from vwxyzjn March 29, 2022 14:08

Update pre-commit file

d311c1f

vercel bot deployed to Preview March 29, 2022 15:01 View deployment

Fix weird github action error:

6a78a34

psf/black#2964

vercel bot deployed to Preview March 29, 2022 15:04 View deployment

Typo fix

ab30b20

vercel bot deployed to Preview March 31, 2022 06:16 View deployment

vwxyzjn requested changes Apr 4, 2022

View reviewed changes

Follow up on change requests after review.\ntypo fix -- legend and po…

456b0c1

…lot color changes -- mentions global gradient clipping

vercel bot deployed to Preview April 5, 2022 03:46 View deployment

dosssman requested a review from vwxyzjn April 5, 2022 03:47

Fleshed out the reparam. trick for action sampling, removed the TODO,…

f68e5cf

… added VAE paper citation

vercel bot deployed to Preview April 5, 2022 03:56 View deployment

dosssman added 2 commits April 8, 2022 15:45

Pulled recent changes to master to include sac license

04d2c47

added pranz24 reference to sac docs

5359826

vercel bot deployed to Preview April 8, 2022 06:47 View deployment

Added pranz24 reference and licenses

9b6b158

vercel bot deployed to Preview April 8, 2022 06:50 View deployment

vwxyzjn approved these changes Apr 8, 2022

View reviewed changes

dosssman merged commit 9428ce6 into vwxyzjn:master Apr 9, 2022

dosssman deleted the sac-docs branch April 9, 2022 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAC Documentation - Benchmarks - Minor code tweaks #146

SAC Documentation - Benchmarks - Minor code tweaks #146

dosssman commented Mar 23, 2022 •

edited

Loading

vercel bot commented Mar 23, 2022 •

edited

Loading

gitpod-io bot commented Mar 23, 2022

dosssman commented Mar 29, 2022

vwxyzjn commented Mar 29, 2022

vwxyzjn commented Mar 29, 2022

vwxyzjn commented Mar 29, 2022

vwxyzjn left a comment

vwxyzjn Apr 4, 2022

dosssman Apr 5, 2022

dosssman commented Apr 5, 2022

dosssman commented Apr 8, 2022

vwxyzjn left a comment

dosssman commented Apr 9, 2022


		## Overview

		The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies.

SAC Documentation - Benchmarks - Minor code tweaks #146

SAC Documentation - Benchmarks - Minor code tweaks #146

Conversation

dosssman commented Mar 23, 2022 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Mar 23, 2022 • edited Loading

gitpod-io bot commented Mar 23, 2022

dosssman commented Mar 29, 2022

vwxyzjn commented Mar 29, 2022

vwxyzjn commented Mar 29, 2022

vwxyzjn commented Mar 29, 2022

vwxyzjn left a comment

Choose a reason for hiding this comment

vwxyzjn Apr 4, 2022

Choose a reason for hiding this comment

dosssman Apr 5, 2022

Choose a reason for hiding this comment

dosssman commented Apr 5, 2022

dosssman commented Apr 8, 2022

vwxyzjn left a comment

Choose a reason for hiding this comment

dosssman commented Apr 9, 2022

dosssman commented Mar 23, 2022 •

edited

Loading

vercel bot commented Mar 23, 2022 •

edited

Loading