Pretrained on ImageNet dataset.
Config | Backbone | Params (backbone/total) |
Train memory (GB) |
Flops | inference time(V100) (ms/img) |
Epochs | Download |
---|---|---|---|---|---|---|---|
mae_vit_base_patch16_8xb64_400e | ViT-B/16 | 85M/111M | 9.5 | 9.8G | 8.03 | 400 | model |
mae_vit_base_patch16_8xb64_1600e | ViT-B/16 | 85M/111M | 9.5 | 9.8G | 8.03 | 1600 | model |
mae_vit_large_patch16_8xb32_1600e | ViT-L/16 | 303M/329M | 11.3 | 20.8G | 16.30 | 1600 | model |
Pretrained on ImageNet dataset.
Config | Backbone | Params (backbone/total) |
Train memory (GB) |
Flops | inference time(V100) (ms/img) |
Total train time | Epochs | Download |
---|---|---|---|---|---|---|---|---|
fast_convmae_vit_base_patch16_8xb64_50e | ConvViT-B/16 | 88M/115M | 30.3 | 45.1G | 6.88 | 20h (8*A100) |
50 | model - log |
The flops of Fast ConvMAE is about four times of MAE, because the mask of MAE only retains 25% of the tokens each forward, but the mask of Fast ConvMAE adopts a complementary strategy, dividing the mask into four complementary parts with 25% token each part. This is equivalent to learning four samples at each forward, achieving 4 times the learning effect.
Pretrained on ImageNet dataset.
Config | Backbone | Params (backbone/total) |
Train memory (GB) |
inference time(V100) (ms/img) |
Epochs | Download |
---|---|---|---|---|---|---|
dino_deit_small_p16_8xb32_100e | DeiT-S/16 | 21M/88M | 10.5 | 6.17 | 100 | model |
Pretrained on ImageNet dataset.
Config | Backbone | Params (backbone/total) |
Flops | Train memory (GB) |
inference time(V100) (ms/img) |
Epochs | Download |
---|---|---|---|---|---|---|---|
moby_deit_small_p16_4xb128_300e | DeiT-S/16 | 21M/26M | 18.6G | 21.4 | 6.17 | 300 | model - log |
moby_swin_tiny_8xb64_300e | Swin-T | 27M/33M | 18.1G | 16.1 | 9.74 | 300 | model - log |
Pretrained on ImageNet dataset.
Config | Backbone | Params (backbone/total) |
Flops | Train memory (GB) |
inference time(V100) (ms/img) |
Epochs | Download |
---|---|---|---|---|---|---|---|
mocov2_resnet50_8xb32_200e | ResNet50 | 23M/28M | 8.2G | 5.4 | 8.59 | 200 | model |
Pretrained on ImageNet dataset.
Config | Backbone | Params (backbone/total) |
Flops | Train memory (GB) |
inference time(V100) (ms/img) |
Epochs | Download |
---|---|---|---|---|---|---|---|
swav_resnet50_8xb32_200e | ResNet50 | 23M/28M | 12.9G | 11.3 | 8.59 | 200 | model - log |
For detailed usage of benchmark tools, please refer to benchmark README.md.
Algorithm | Linear Eval Config | Pretrained Config | Top-1 (%) | Download |
---|---|---|---|---|
SwAV | swav_resnet50_8xb2048_20e_feature | swav_resnet50_8xb32_200e | 73.618 | log |
DINO | dino_deit_small_p16_8xb2048_20e_feature | dino_deit_small_p16_8xb32_100e | 71.248 | log |
MoBY | moby_deit_small_p16_8xb2048_30e_feature | moby_deit_small_p16_4xb128_300e | 72.214 | log |
MoCo-v2 | mocov2_resnet50_8xb2048_40e_feature | mocov2_resnet50_8xb32_200e | 66.8 | log |
Algorithm | Fintune Config | Pretrained Config | Top-1 (%) | Download |
---|---|---|---|---|
MAE | mae_vit_base_patch16_8xb64_100e_lrdecay075_fintune | mae_vit_base_patch16_8xb64_400e | 83.13 | fintune model - log |
mae_vit_base_patch16_8xb64_100e_lrdecay065_fintune | mae_vit_base_patch16_8xb64_1600e | 83.55 | fintune model - log | |
mae_vit_large_patch16_8xb16_50e_lrdecay075_fintune | mae_vit_large_patch16_8xb32_1600e | 85.70 | fintune model - log | |
Fast ConvMAE | fast_convmae_vit_base_patch16_8xb64_100e_fintune | fast_convmae_vit_base_patch16_8xb64_50e | 84.37 | fintune model - log |
Algorithm | Eval Config | Pretrained Config | mAP (Box) | mAP (Mask) | Download |
---|---|---|---|---|---|
Fast ConvMAE | mask_rcnn_conv_vitdet_50e_coco | fast_convmae_vit_base_patch16_8xb64_50e | 51.3 | 45.6 | eval model |
SwAV | mask_rcnn_r50_fpn_1x_coco | swav_resnet50_8xb32_200e | 40.38 | 36.48 | eval model - log |
MoCo-v2 | mask_rcnn_r50_fpn_1x_coco | mocov2_resnet50_8xb32_200e | 39.9 | 35.8 | eval model - log |
MoBY | mask_rcnn_swin_tiny_1x_coco | moby_swin_tiny_8xb64_300e | 43.11 | 39.37 | eval model - log |
Algorithm | Eval Config | Pretrained Config | mIOU | Download |
---|---|---|---|---|
SwAV | fcn_r50-d8_512x512_60e_voc12aug | swav_resnet50_8xb32_200e | 63.91 | eval model - log |
MoCo-v2 | fcn_r50-d8_512x512_60e_voc12aug | mocov2_resnet50_8xb32_200e | 68.49 | eval model - log |