Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRY_RUN mode's performance #14

Open
Lukas-Ma1 opened this issue Nov 11, 2023 · 3 comments
Open

DRY_RUN mode's performance #14

Lukas-Ma1 opened this issue Nov 11, 2023 · 3 comments

Comments

@Lukas-Ma1
Copy link

Lukas-Ma1 commented Nov 11, 2023

I would like to ask about coco training and test performance on DRY_RUN mode. I use DRY_RUN on each command you mentioned, including extract globals, objects and blocks features. When I run:
DRY_RUN=True TRAIN_WITH_VAL_DATASET=True torchrun --nproc_per_node=4 -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py --override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json

The DP training for coco, the result be like:

2023-11-11 23:43:33,261 - mmdet - INFO - Iter(val) [1] COCO_48_17_bbox_mAP_: 0.8614, COCO_48_17_bbox_mAP_50: 0.8614, COCO_48_17_bbox_mAP_75: 0.8614, COCO_48_17_bbox_mAP_s: 0.7921, COCO_48_17_bbox_mAP_m: 1.0000, COCO_48_17_bbox_mAP_l: 1.0000, COCO_48_17_bbox_mAP_copypaste: 0.8614 0.8614 0.8614 0.7921 1.0000 1.0000, COCO_48_bbox_mAP_: 0.8614, COCO_48_bbox_mAP_50: 0.8614, COCO_48_bbox_mAP_75: 0.8614, COCO_48_bbox_mAP_s: 0.7921, COCO_48_bbox_mAP_m: 1.0000, COCO_48_bbox_mAP_l: 1.0000, COCO_48_bbox_mAP_copypaste: 0.8614 0.8614 0.8614 0.7921 1.0000 1.0000, COCO_17_bbox_mAP_: -1.0000, COCO_17_bbox_mAP_50: -1.0000, COCO_17_bbox_mAP_75: -1.0000, COCO_17_bbox_mAP_s: -1.0000, COCO_17_bbox_mAP_m: -1.0000, COCO_17_bbox_mAP_l: -1.0000, COCO_17_bbox_mAP_copypaste: -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
2023-11-11 23:43:33,606 - mmdet - INFO - Saving checkpoint at 40000 iterations
2023-11-11 23:43:35,052 - mmdet - INFO - Iter [40000/40000.0] lr: 2.000e-03, eta: 0:00:00, time: 4.103, data_time: 2.348, memory: 2148, loss_rpn_cls: 0.0000, loss_rpn_bbox: 0.0011, loss_cls: 0.0021, acc: 99.9512, loss_bbox: 0.0073, loss_global: 0.0002, recall_global: 69.3125, loss_block: 0.0016, recall_block: 11.1172, loss_clip_objects: 0.6160, loss_clip_global: 0.2377, loss_clip_blocks: 0.5239, loss_clip_block_relations: 0.0503, loss: 1.4403

I got a ridiculous result: 0.8614 mAP, there must be something wrong, but I check my process, data structure and commands, all these are following your steps. So is DRY_RUN makes this unreal result and I should turn to run without DRY_RUN?(By the way, extract globals, objects and blocks features on DRY_RUN seems to be smaller than without DRY_RUN's, so I should download from Baidu disk?)

Thanks for your attention and impressive work!

@LutingWang
Copy link
Owner

The DRY_RUN mode is intended for fast validation of the code's correctness. It does this by trimming down the number of samples used for training or validation. So, getting a 0.8614 result indicates that you've set everything up correctly and you're all set for a full run.

@Lukas-Ma1
Copy link
Author

Thank you, there are another 2 questions I would like to ask:

  1. I have run another training process with TRAIN_WITH_VAL_DATASET=True torchrun --nproc_per_node=4 -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py --override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json, without DRY_RUN. However, the result still not seem to be normal, it only gets 0.1649 on mAPN50, doesn't reach 0.313. Should I reduce all optional parts and run 'torchrun --nproc_per_node=4 -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py'?
  2. In your paper, the attention mask is a (N+1, N+1) matrix, why the value of right-down corner is 1?

@LutingWang
Copy link
Owner

  1. Apologies for any confusion. When TRAIN_WITH_VAL_DATASET=True is set in the environment, it activates a debug mode, wherein the training is performed using the validation split of the MS-COCO 2017 dataset.
  2. Please refer to eq. (9). The additional token is $x_\texttt{[OBJ]}$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants