We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
I am trying to run these models to evaluate the results, however I am not able to do that due to errors at runtime.
The best "result" I could get is by with this Dockerfile (at the root of the project):
FROM nvidia/cuda:11.4.3-cudnn8-devel-ubuntu18.04 ARG DEBIAN_FRONTEND=noninteractive ENV TZ=Etc/UTC ENV LC_ALL=C.UTF-8 ENV LANG=C.UTF-8 # Install system dependencies RUN apt-get update && \ apt-get install -y \ git \ wget \ python3-pip \ python3-dev \ python3-opencv \ python3-six RUN python3 -m pip install --upgrade pip RUN pip3 install setuptools openmim # Install PyTorch and torchvision RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html RUN python3 -m pip install h5py albumentations tensorboardX gdown scipy RUN python3 -m mim install mmcv # Upgrade pip WORKDIR / RUN wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat -O nyu_depth_v2_labeled.mat RUN git clone https://github.com/vinvino02/GLPDepth.git --depth 1 RUN mv GLPDepth/code/utils/logging.py GLPDepth/code/utils/glp_depth_logging.py # Set the working directory WORKDIR /app RUN python3 ../GLPDepth/code/utils/extract_official_train_test_set_from_mat.py ../nyu_depth_v2_labeled.mat ../GLPDepth/datasets/splits.mat ./data/nyu_depth_v2/official_splits/ # RUN ln -s data ait/data COPY requirements.txt requirements.txt RUN python3 -m pip install -r requirements.txt COPY . . RUN rm -rf .git
Built the Dockerfile with:
sudo docker build -t mde . -f Dockerfile
And run with:
sudo docker run --name mde-test --gpus all --ipc=host -it --rm -v $(pwd):/app mde
Finally running the evaluation command. For example:
cd ait python3 -m torch.distributed.launch --nproc_per_node=1 code/train.py configs/swinv2b_480reso_parallel_depthonly.py --cfg-options model.task_heads.depth.vae_cfg.pretrained=../models/vqvae_depth_2bp.pt --eval ../models/ait_depth_swinv2b_parallel.pth
In this way, the inference process is launched, eventually an anonymous error happen:
eval task depth [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 654/654, 2.5 task/s, elapsed: 262s, ETA: 0sERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 34) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in <module> main() File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: =================================================== code/train.py FAILED --------------------------------------------------- Failures: <NO_OTHER_FAILURES> --------------------------------------------------- Root Cause (first observed failure): [0]: time : 2023-08-26_03:01:18 host : f50427e7ad50 rank : 0 (local_rank: 0) exitcode : -9 (pid: 34) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 34 ===================================================
Are the authors able to provide the versions of all the software they are using? In particular:
Thanks.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello,
I am trying to run these models to evaluate the results, however I am not able to do that due to errors at runtime.
The best "result" I could get is by with this Dockerfile (at the root of the project):
Built the Dockerfile with:
And run with:
Finally running the evaluation command. For example:
In this way, the inference process is launched, eventually an anonymous error happen:
Are the authors able to provide the versions of all the software they are using? In particular:
Thanks.
The text was updated successfully, but these errors were encountered: