Execise 6.1 #79

RangerChu · 2021-03-23T03:25:40Z

V_t denote the array of state values used at time t in the TD error (6.5) and in the TD update (6.2).
And delta_t is calculated at time t+1.

The agent only updates the V value of S_t at the time of t+1, and the V values of other states remain unchanged.

ehddnr747 · 2021-05-14T09:24:07Z

You misunderstand the problem definition.
Even though delta_t is calculated at time step t+1 with R_[t+1] and S_[t+1], use call it V_t, which is used to calculate delta_t.

zexiangliu · 2021-10-30T13:41:00Z

I agree with @ehddnr747 that V_t is used to calculate delta_t instead of V_{t+1}. If that is fixed in @RangerChu 's answer, we should have a correct solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execise 6.1 #79

Execise 6.1 #79

RangerChu commented Mar 23, 2021 •

edited

Loading

ehddnr747 commented May 14, 2021

zexiangliu commented Oct 30, 2021

Execise 6.1 #79

Execise 6.1 #79

Comments

RangerChu commented Mar 23, 2021 • edited Loading

ehddnr747 commented May 14, 2021

zexiangliu commented Oct 30, 2021

RangerChu commented Mar 23, 2021 •

edited

Loading