You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You misunderstand the problem definition.
Even though delta_t is calculated at time step t+1 with R_[t+1] and S_[t+1], use call it V_t, which is used to calculate delta_t.
I agree with @ehddnr747 that V_t is used to calculate delta_t instead of V_{t+1}. If that is fixed in @RangerChu 's answer, we should have a correct solution.
V_t denote the array of state values used at time t in the TD error (6.5) and in the TD update (6.2).
And delta_t is calculated at time t+1.
The agent only updates the V value of S_t at the time of t+1, and the V values of other states remain unchanged.
The text was updated successfully, but these errors were encountered: