This Repo conduct Image Classification task.
Download Pre-trained model (Google's Official Checkpoint)
- Available models: ViT-B_16(85.8M), R50+ViT-B_16(97.96M), ViT-B_32(87.5M), ViT-L_16(303.4M), ViT-L_32(305.5M), ViT-H_14(630.8M)
- imagenet21k pre-train models
- ViT-B_16, ViT-B_32, ViT-L_16, ViT-L_32, ViT-H_14
- imagenet21k pre-train + imagenet2012 fine-tuned models
- ViT-B_16-224, ViT-B_16, ViT-B_32, ViT-L_16-224, ViT-L_16, ViT-L_32
- Hybrid Model(Resnet50 + Transformer)
- R50-ViT-B_16
- imagenet21k pre-train models
# imagenet21k pre-train
wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz
# imagenet21k pre-train + imagenet2012 fine-tuning
wget https://storage.googleapis.com/vit_models/imagenet21k+imagenet2012/{MODEL_NAME}.npz
python3 train.py --name cifar10-100_500 --dataset cifar10 --model_type ViT-B_16 --pretrained_dir checkpoint/ViT-B_16.npz
CIFAR-10 and CIFAR-100 are automatically download and train. For Imagenet, you can download and organize as (vit/ILSVRC2012/train/
) and (vit/ILSVRC2012/val/
) .
The default batch size is 512. When GPU memory is insufficient, you can proceed with training by adjusting the value of --gradient_accumulation_steps
.
Also can use Automatic Mixed Precision(Amp) to reduce memory usage and train faster
python3 train.py --name cifar10-100_500 --dataset cifar10 --model_type ViT-B_16 --pretrained_dir checkpoint/ViT-B_16.npz --fp16 --fp16_opt_level O2