Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execise 6.1 #79

Open
RangerChu opened this issue Mar 23, 2021 · 2 comments
Open

Execise 6.1 #79

RangerChu opened this issue Mar 23, 2021 · 2 comments

Comments

@RangerChu
Copy link

RangerChu commented Mar 23, 2021

V_t denote the array of state values used at time t in the TD error (6.5) and in the TD update (6.2).
And delta_t is calculated at time t+1.

QQ图片20210323111639

The agent only updates the V value of S_t at the time of t+1, and the V values of other states remain unchanged.

QQ图片20210323111643

1

IMG_20210323_113648_edit_190878386338582

@ehddnr747
Copy link

You misunderstand the problem definition.
Even though delta_t is calculated at time step t+1 with R_[t+1] and S_[t+1], use call it V_t, which is used to calculate delta_t.

@zexiangliu
Copy link

I agree with @ehddnr747 that V_t is used to calculate delta_t instead of V_{t+1}. If that is fixed in @RangerChu 's answer, we should have a correct solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants