You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.
This would break in environments that return the state as more/less than 4 values for unpacking.
If not essential can we just remove this?
If it's essential, would someone explain why and/or reference the paper for this?
This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole.
@WorksWellWithOthers This is indeed a form of reward engineering and is specific to CartPole to turn the returned state into a numeric reward. Other environments would not need this specifically, and potentially would return a distinct reward already.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This would break in environments that return the state as more/less than 4 values for unpacking.
This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole.
The text was updated successfully, but these errors were encountered: