Exercise 3.29 might have a mistake #83

rvitorper · 2021-05-02T02:06:25Z

Hello!

I was checking your answer for exercise 3.29, and I think it might have a mistake. The final equation averages over all actions, whereas I think it should be the maximum of all actions - hence removing the policy function.

I believe it is a mistake because the backup diagram for q*(page 64) shows the maximum rather than the average.

Looking forward to hearing from you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise 3.29 might have a mistake #83

Exercise 3.29 might have a mistake #83

rvitorper commented May 2, 2021

Exercise 3.29 might have a mistake #83

Exercise 3.29 might have a mistake #83

Comments

rvitorper commented May 2, 2021