You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
【Existing code:】
Only reset the environment at the beginning of training loop, that is, only call env.reset() at the first epoch.
【Right(might) training paradigm】
I checked OpenAI spinning-up's implement of PPO https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py, they do reset the env at the end of each epoch (same as reset it at the beginning of each epoch).
Correct me if I were wrong:)
P.S.: It;s still nice code!
The text was updated successfully, but these errors were encountered:
In the former (OpenAI's) implementation, this loop will perform more than one episode, and it calls reset when an episode is done (but not jump out the loop). In the latter (this repo's) implementation, the loop performs only one episode. When an episode is done, it breaks the loop and resets the env (before the next episode begins).
【Existing code:】
Only reset the environment at the beginning of training loop, that is, only call env.reset() at the first epoch.
【Right(might) training paradigm】
I checked OpenAI spinning-up's implement of PPO https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py, they do reset the env at the end of each epoch (same as reset it at the beginning of each epoch).
Correct me if I were wrong:)
P.S.: It;s still nice code!
The text was updated successfully, but these errors were encountered: