Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HANDS-ON BUG] mlagents-learn in unit 5 not working #571

Open
benbekir opened this issue Oct 24, 2024 · 4 comments
Open

[HANDS-ON BUG] mlagents-learn in unit 5 not working #571

benbekir opened this issue Oct 24, 2024 · 4 comments

Comments

@benbekir
Copy link

benbekir commented Oct 24, 2024

Describe the bug

The command

!mlagents-learn ./config/ppo/SnowballTarget.yaml --env=./training-envs-executables/linux/SnowballTarget/SnowballTarget --run-id="SnowballTarget1" --no-graphics

doesnt work.
It always results in the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm).

I have tried specifying the device with the --torch-device argument but that didnt help either.
Maybe this has something to do with the fact that dev versions are used for the ml-agents and ml-agents-envs packages?

  ml-agents: 1.2.0.dev0,
  ml-agents-envs: 1.2.0.dev0,
  Communicator API: 1.5.0,
  PyTorch: 2.5.0+cu121

Material

  • Did you use Google Colab?
    Yes
@TPK-MAKG
Copy link

Experiencing the same issue. Additionally, the runtime requires a restart after the installation of numpy packages, otherwise it can't find the hyperparameter file in /config/ppo/SnowballTarget.yaml. Maybe it's a package version conflict?

@staffanrolfsson
Copy link

staffanrolfsson commented Nov 1, 2024

I get the same errors.
The first problem after restart (for numpy), is that you will lack the change of working directory "% cd ml-agents", so the following cells will not work as intended. Easy fixed by just adding a cell to do this change and installations following will be correct. But main issue is still there when you try to start the learning:
"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)"
The Pyramids learning further down in the same Notebook works fine...

@staffanrolfsson
Copy link

Tested to change the config file for SnowballTarget, setting threaded: false made the learning possible to start...

@brumocas
Copy link

brumocas commented Nov 4, 2024

Tested to change the config file for SnowballTarget, setting threaded: false made the learning possible to start...

This worked for me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants