Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 271 Bytes

File metadata and controls

6 lines (4 loc) · 271 Bytes

The direct utility estimation method in Section passive-rl-section uses distinguished terminal states to indicate the end of a trial. How could it be modified for environments with discounted rewards and no terminal states?