Version 1.1.0 Classical / Modern RL Models #18

josiahls · 2020-03-02T01:52:54Z

Classical / Modern RL Models

- initial updates to notebooks

- fixed target lunar lander

- notebooks Fixed: - there seems to be a strange versioning issue with whether the keyword axis needs to be passed on argmax pytorch functions

- fixed target lunar lander

- there seems to be an overall issue with image generation

- initial gifs, finished notebooks

- gif table generating notebook

- reward graphs

- initial TRPO step code. Highly likely this is way off. This is a first attempt as translating math of the research paper into a code implementation. Excited to see how close I was to the real implimentation

- first good start with REINFORCE

- REINFORCE is training now, but doesnt work. What happens when the actions are binary? An action with probability 1 is always going ot be sampled!

- cross entropy method. Does not seem to work great right now, pretty sure an existing bug in the code.. Removed: - reinforce and trpo code. Need to start over...

- OK THIS IS BIG, WEIGHT DECAY FUCKS THINGS. This might mean that other RL models might perform better also with weight decay set to 0...

- current cem test is now flagged as a performance test.

- NStep Experience replay. Very very promising

- guassian noise layers. They do not seem to improve performance on cartpole, but may do better on Atari games - ROADMAP items

- Greedy epsilon crashing lol

- old reinforcement failing unit tests.

- resolution wrapper handles other returns from render better Added: - distributional dqn. Does not seem to work well with cartpole, investigating

- alternate dist dqn, which trains quickly now

- RAINBOW dqn. Currently it is one of the worst performing dqns. In the next update (1.2)

- roadmap

- REINFORCE model for Cartpole

- REINFORCE roadmap

josiah and others added 30 commits February 2, 2020 12:00

Added:

fd08f83

- initial updates to notebooks

Added:

8e5a7d0

- fixed target lunar lander

Updated:

c06666c

- notebooks Fixed: - there seems to be a strange versioning issue with whether the keyword axis needs to be passed on argmax pytorch functions

Added:

f28ae7f

- fixed target lunar lander

Merge remote-tracking branch 'origin/version_1_0_0' into version_1_0_0

ac91d52

Note:

494efc1

- there seems to be an overall issue with image generation

Added:

dc9671b

- initial gifs, finished notebooks

Added:

76a633f

- gif table generating notebook

Added:

9ca68f7

- reward graphs

Added:

ec37505

- reward graphs

Added:

2d3d466

- reward graphs

Added:

006b41f

- reward graphs

Added:

336796d

- reward graphs

Added:

b1d7aa6

- reward graphs

Added:

3a09d5f

- initial TRPO step code. Highly likely this is way off. This is a first attempt as translating math of the research paper into a code implementation. Excited to see how close I was to the real implimentation

Added:

cb10788

- first good start with REINFORCE

Added:

c44a6aa

- REINFORCE is training now, but doesnt work. What happens when the actions are binary? An action with probability 1 is always going ot be sampled!

Added:

57d922c

- cross entropy method. Does not seem to work great right now, pretty sure an existing bug in the code.. Removed: - reinforce and trpo code. Need to start over...

Fixed:

5a3094b

- OK THIS IS BIG, WEIGHT DECAY FUCKS THINGS. This might mean that other RL models might perform better also with weight decay set to 0...

Changed:

a1dfbad

- current cem test is now flagged as a performance test.

Added:

cab3452

- NStep Experience replay. Very very promising

Added:

0828544

- guassian noise layers. They do not seem to improve performance on cartpole, but may do better on Atari games - ROADMAP items

Merge branch 'master' into version_1_1_0

c3367ba

Fixed:

af78818

- Greedy epsilon crashing lol

Fixed:

9a9595b

- old reinforcement failing unit tests.

Changed:

bc1c2b9

- resolution wrapper handles other returns from render better Added: - distributional dqn. Does not seem to work well with cartpole, investigating

Added:

deee785

- alternate dist dqn, which trains quickly now

Added:

f30fe6b

- RAINBOW dqn. Currently it is one of the worst performing dqns. In the next update (1.2)

Updated:

2c09b44

- roadmap

Added:

2b62455

- REINFORCE model for Cartpole

Updated:

fc6765c

- REINFORCE roadmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.1.0 Classical / Modern RL Models #18

Version 1.1.0 Classical / Modern RL Models #18

josiahls commented Mar 2, 2020 •

edited

Loading

Version 1.1.0 Classical / Modern RL Models #18

Are you sure you want to change the base?

Version 1.1.0 Classical / Modern RL Models #18

Conversation

josiahls commented Mar 2, 2020 • edited Loading

josiahls commented Mar 2, 2020 •

edited

Loading