PPO value function clip #1

Asuka20 · 2020-08-30T16:40:41Z

Hi, why do you use maximum instead of minimum to clipping value function loss?
Suppose clippinng occurs, when v_pred_old < v_clipped < v_pred < R, or reversely, the loss will be larger than not clipped. Then why would it works to reduce the variability?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO value function clip #1

PPO value function clip #1

Asuka20 commented Aug 30, 2020

PPO value function clip #1

PPO value function clip #1

Comments

Asuka20 commented Aug 30, 2020