Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.1.0 Classical / Modern RL Models #18

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

josiahls
Copy link
Owner

@josiahls josiahls commented Mar 2, 2020

Classical / Modern RL Models

  • Add Cross Entropy Method CEM
  • NStep Experience replay
  • Gaussian and Factored Gaussian Noise exploration replacement
  • Add Distributional DQN
  • Add RAINBOW DQN
  • Add REINFORCE
  • Add PPO
  • Add TRPO
  • Add D4PG
  • Add A2C
  • Add A3C
  • Add SAC

josiah and others added 30 commits February 2, 2020 12:00
- initial updates to notebooks
- fixed target lunar lander
- notebooks
Fixed:
- there seems to be a strange versioning issue with whether the keyword axis needs to be passed on argmax pytorch functions
- fixed target lunar lander
- there seems to be an overall issue with image generation
- initial gifs, finished notebooks
- gif table generating notebook
- reward graphs
- reward graphs
- reward graphs
- reward graphs
- reward graphs
- reward graphs
- initial TRPO step code. Highly likely this is way off. This is a first attempt as translating math of the research paper into a code implementation. Excited to see how close I was to the real implimentation
- first good start with REINFORCE
- REINFORCE is training now, but doesnt work. What happens when the actions are binary? An action with probability 1 is always going ot be sampled!
- cross entropy method. Does not seem to work great right now, pretty sure an existing bug in the code..
Removed:
- reinforce and trpo code. Need to start over...
- OK THIS IS BIG, WEIGHT DECAY FUCKS THINGS. This might mean that other RL models might perform better also with weight decay set to 0...
- current cem test is now flagged as a performance test.
- NStep Experience replay. Very very promising
- guassian noise layers. They do not seem to improve performance on cartpole, but may do better on Atari games
- ROADMAP items
- Greedy epsilon crashing lol
- old reinforcement failing unit tests.
- resolution wrapper handles other returns from render better
Added:
- distributional dqn. Does not seem to work well with cartpole, investigating
- alternate dist dqn, which trains quickly now
- RAINBOW dqn. Currently it is one of the worst performing dqns. In the next update (1.2)
- roadmap
- REINFORCE model for Cartpole
- REINFORCE roadmap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant