- Roboschool with PPO and adaptive learning rate:
Red - one network with two heads with const LR, time to learn 2 hours.
Blue - one network with tow heads with adaptive LR based on KL-divergence. Time to learn 27 minutes.
Grey - two separate networks, lucky seed. Time to learn 9 minutes!!!