Skip to content
This repository has been archived by the owner on Jan 1, 2025. It is now read-only.

reproduction of Panoptic segmentation on COCO #79

Closed
wmkai opened this issue Apr 8, 2022 · 5 comments
Closed

reproduction of Panoptic segmentation on COCO #79

wmkai opened this issue Apr 8, 2022 · 5 comments

Comments

@wmkai
Copy link

wmkai commented Apr 8, 2022

Hi thank you for your excellent work. I meet a problem when re-run your experiments.

I tried to follow your advice in Getting Started with Mask2Former, and run:
python train_net.py --num-gpus 8 \ --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
after training, the log showed that "Start inference on 625 batches". But after a few days, there are still no new logs. So I kill this process and run
python train_net.py \ --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ --eval-only MODEL.WEIGHTS ./output/model_0094999.pth,
after evaluation, it showed that the result was
image
was lower that the result from Table 1 in the paper,
image
could you help me see what is the reason for this ^ ^

@bowenc0221
Copy link
Contributor

Your training did not finish, please refer to #74

@wmkai
Copy link
Author

wmkai commented Apr 9, 2022

thx for replying, but my problem seems not the same as #74. His pytorch nccl and system nccl version numbers are not the same but mine are the same. Everytime after iteration 94979, my training process stop. At the same time, all my 8 GPUs are utilized 0% which is not the same with #74.

@bowenc0221
Copy link
Contributor

The COCO model is trained for 368750 iterations, but you evaluated the model on the 94999-th iteration.

@wmkai
Copy link
Author

wmkai commented Apr 10, 2022

thanks, I tried again and find these logs after the 94999-th iteration
image
image
And then I checked the CPU memory usage and found the CPU memory is exhausted.
image
Can I ask about how much CPU memory is required for this training process

@wmkai
Copy link
Author

wmkai commented Apr 13, 2022

it seems that I met this problem in open-mmlab/mmdetection#7538

@wmkai wmkai closed this as completed Aug 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants