Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Nondeterminisitc behaviour of MetaDriveEnv #758

Closed
olek-osikowicz opened this issue Aug 30, 2024 · 3 comments · Fixed by #789
Closed

BUG: Nondeterminisitc behaviour of MetaDriveEnv #758

olek-osikowicz opened this issue Aug 30, 2024 · 3 comments · Fixed by #789

Comments

@olek-osikowicz
Copy link

Hi MetaDrive team,

I believe I discovered a bug, in resetting the MetaDriveEnv resulting in nondeterminism.

MetaDrive simulation supposed to be deterministic but even when I use the enviroment and reset it with same with same seed resulting traces are not identical. Cosider following code adapted from examples:

try:
    env=MetaDriveEnv(config={"map":"C",
                            "num_scenarios": n_scenarios})

    for rep in range(n_scenarios):
        obs, step_info = env.reset(seed)
        while True:
            # get action from expert driving, or a dummy action
            action = expert(env.agent, deterministic=True) if expert_driving else [0, 0.33]
            obs, reward, tm, tr, step_info = env.step(action)
            traces.append(step_info)
            
            if tm or tr:
                break
finally:
    env.close()

When I was analyzing traces (step info for each timestep) from diffrent repetitions I found slight diffrences probably comming from floating point number arithemtic. Those diffrences (error) between traces is magnified, the longer the episode is.

Suspecting that .reset() function doesn't clear the state properly I started initializing the enviroment for each repetition, and closing at the end.

try:

    for rep in range(n_scenarios):
        env=MetaDriveEnv(config={"map":"C",
                                "num_scenarios": n_scenarios})
        obs, step_info = env.reset(seed)
        while True:
            
            # get action from expert driving, or a dummy action
            action = expert(env.agent, deterministic=True) if expert_driving else [0, 0.33]
            obs, reward, tm, tr, step_info = env.step(action)
            if tm or tr:
                break

        env.close()
finally:
    pass

Above solved an issue and each traces produced are exacly the same (fully deterministic).

Please see my notebook reproducing the bug.

Conda env

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.conda
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2024.7.4-hbcca054_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/ld_impl_linux-64-2.38-h1181459_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-11.2.0-h1234567_1.conda
https://repo.anaconda.com/pkgs/main/noarch/tzdata-2024a-h04d1e81_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/libgomp-11.2.0-h1234567_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/_openmp_mutex-5.1-1_gnu.conda
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-11.2.0-h1234567_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/bzip2-1.0.8-h5eee18b_6.conda
https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.4.4-h6a678d5_1.conda
https://conda.anaconda.org/conda-forge/linux-64/libsodium-1.0.18-h36c2ea0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libuuid-1.41.5-h5eee18b_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.4-h6a678d5_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/openssl-3.0.14-h5eee18b_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/xz-5.4.6-h5eee18b_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.13-h5eee18b_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/readline-8.2-h5eee18b_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.14-h39e8969_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/zeromq-4.3.5-h6a678d5_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.45.3-h5eee18b_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/python-3.10.14-h955ad1f_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/debugpy-1.6.7-py310h6a678d5_0.conda
https://conda.anaconda.org/conda-forge/noarch/decorator-5.1.1-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/entrypoints-0.4-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/exceptiongroup-1.2.2-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/executing-2.0.1-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/nest-asyncio-1.6.0-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/packaging-24.1-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/parso-0.8.4-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/pickleshare-0.7.5-py_1003.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/platformdirs-4.2.2-pyhd8ed1ab_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/psutil-5.9.0-py310h5eee18b_0.conda
https://conda.anaconda.org/conda-forge/noarch/ptyprocess-0.7.0-pyhd3deb0d_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/pure_eval-0.2.3-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/pygments-2.18.0-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.10-2_cp310.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pyzmq-25.1.2-py310h6a678d5_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/setuptools-72.1.0-py310h06a4308_0.conda
https://conda.anaconda.org/conda-forge/noarch/six-1.16.0-pyh6c4a22f_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/traitlets-5.14.3-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/typing_extensions-4.12.2-pyha770c72_0.conda
https://conda.anaconda.org/conda-forge/noarch/wcwidth-0.2.13-pyhd8ed1ab_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/wheel-0.43.0-py310h06a4308_0.conda
https://conda.anaconda.org/conda-forge/noarch/asttokens-2.4.1-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/comm-0.2.2-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/jedi-0.19.1-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/linux-64/jupyter_core-5.7.2-py310hff52083_0.conda
https://conda.anaconda.org/conda-forge/noarch/matplotlib-inline-0.1.7-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/pexpect-4.9.0-pyhd8ed1ab_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pip-24.2-py310h06a4308_0.conda
https://conda.anaconda.org/conda-forge/noarch/prompt-toolkit-3.0.47-pyha770c72_0.conda
https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.9.0-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/linux-64/tornado-6.1-py310h5764c6d_3.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/jupyter_client-7.3.4-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.2-pyhd8ed1ab_0.conda
https://conda.anaconda.org/conda-forge/noarch/ipython-8.26.0-pyh707e725_0.conda
https://conda.anaconda.org/conda-forge/noarch/ipykernel-6.29.5-pyh3099207_0.conda

@pengzhenghao
Copy link
Member

Thanks for raising this! May I ask is this problem caused by:

  1. floating point problem with expert policy, or
  2. reset function doesn't clear state correctly, or
  3. random state not reset properly?

@olek-osikowicz
Copy link
Author

Hi @pengzhenghao,
Since the time I submitted this issue I moved the notebook. I've pushed it now to the new repo.

Answering your questions:

  1. I don't think it's the problem with an expert policy, I ran with both the torch, and numpy versions and the bug appears consistently.

  2. I don't think it's a problem with random state. I ran scenarios with expert agent with deterministic=True. I looked a the code briefly and policy doesn't draw any random number.

  3. So I presume it's the "Reset function doesn't clear state correctly" option. The state is cleared complitely only if you reinitialize the environment object MetaDriveEnv(...). If the .reset() would clear the state completely, there wouldn't be a diffrence beetween traces.

However those are only my suggestions, feel free to look at the code, and share your thoughts.

@pengzhenghao
Copy link
Member

Thanks for sharing the code! I do some experiments:

  1. reset + rollout expert = Reproduce your result. Yes the traces are different.
  2. only reset is determinisitic. If we don't call any env.step then the initial state are deterministic. This is expected and is a major promise that current MD is still working..
  3. reset + single step from expert is non-deterministic.
  4. reset + single step fixed action is determinsitic.
  5. reset + rollut fixed action is non-det.

I notice that you are using absolute equal to do assertions. We usually use np's almost equal to help avoid floating point issue. So I've made a new test script for this:

#789

With this new script I can verified that the relative error <0.1% if we rollout the expert for 50 steps. <1e-6 relative error if we rollout the expert for 1 step.

Therefore, I think there is no bug in metadrive. Some tiny floating point number error is inevitable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants