Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Version 1.1.0 Classical / Modern RL Models #18

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

josiahls
Copy link
Owner

@josiahls josiahls commented Mar 2, 2020

Classical / Modern RL Models

  • Add Cross Entropy Method CEM
  • NStep Experience replay
  • Gaussian and Factored Gaussian Noise exploration replacement
  • Add Distributional DQN
  • Add RAINBOW DQN
  • Add REINFORCE
  • Add PPO
  • Add TRPO
  • Add D4PG
  • Add A2C
  • Add A3C
  • Add SAC

josiah and others added 30 commits February 2, 2020 12:00
- initial updates to notebooks
- fixed target lunar lander
- notebooks
Fixed:
- there seems to be a strange versioning issue with whether the keyword axis needs to be passed on argmax pytorch functions
- fixed target lunar lander
- there seems to be an overall issue with image generation
- initial gifs, finished notebooks
- gif table generating notebook
- reward graphs
- reward graphs
- reward graphs
- reward graphs
- reward graphs
- reward graphs
- initial TRPO step code. Highly likely this is way off. This is a first attempt as translating math of the research paper into a code implementation. Excited to see how close I was to the real implimentation
- first good start with REINFORCE
- REINFORCE is training now, but doesnt work. What happens when the actions are binary? An action with probability 1 is always going ot be sampled!
- cross entropy method. Does not seem to work great right now, pretty sure an existing bug in the code..
Removed:
- reinforce and trpo code. Need to start over...
- OK THIS IS BIG, WEIGHT DECAY FUCKS THINGS. This might mean that other RL models might perform better also with weight decay set to 0...
- current cem test is now flagged as a performance test.
- NStep Experience replay. Very very promising
- guassian noise layers. They do not seem to improve performance on cartpole, but may do better on Atari games
- ROADMAP items
- Greedy epsilon crashing lol
- old reinforcement failing unit tests.
- resolution wrapper handles other returns from render better
Added:
- distributional dqn. Does not seem to work well with cartpole, investigating
- alternate dist dqn, which trains quickly now
- RAINBOW dqn. Currently it is one of the worst performing dqns. In the next update (1.2)
- roadmap
- REINFORCE model for Cartpole
- REINFORCE roadmap
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant