Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 539 Bytes

CONTINUOUS_RESULTS.md

File metadata and controls

13 lines (7 loc) · 539 Bytes

Continuous Results

  • Roboschool with PPO and adaptive learning rate:

Red - one network with two heads with const LR, time to learn 2 hours.
Blue - one network with tow heads with adaptive LR based on KL-divergence. Time to learn 27 minutes.
Grey - two separate networks, lucky seed. Time to learn 9 minutes!!!

alt text

Watch the video