Question: Is this some form of reward engineering? #34

WorksWellWithOthers · 2020-12-05T00:33:43Z

This would break in environments that return the state as more/less than 4 values for unpacking.

If not essential can we just remove this?
If it's essential, would someone explain why and/or reference the paper for this?
This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole.

r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8  
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5  
reward = r1 + r2

The text was updated successfully, but these errors were encountered:

scprotz · 2021-02-01T16:42:05Z

@WorksWellWithOthers This is indeed a form of reward engineering and is specific to CartPole to turn the returned state into a numeric reward. Other environments would not need this specifically, and potentially would return a distinct reward already.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is this some form of reward engineering? #34

Question: Is this some form of reward engineering? #34

WorksWellWithOthers commented Dec 5, 2020 •

edited

Loading

scprotz commented Feb 1, 2021

Question: Is this some form of reward engineering? #34

Question: Is this some form of reward engineering? #34

Comments

WorksWellWithOthers commented Dec 5, 2020 • edited Loading

scprotz commented Feb 1, 2021

WorksWellWithOthers commented Dec 5, 2020 •

edited

Loading