Transformers
This repository contains PyTorch code for the paper "Beyond Grids: Exploring Elastic Input Sampling for Vision".
This code package was based upon DEIT by Meta Research - https://github.com/facebookresearch/deit
Requirements: Pytorch 2.1+ (including torchvision), timm 0.9.12+, torchmetrics 1.2.1+, tqdm, matplotlib, numpy, scikit-image, scipy.
Recommended hardware: NVIDIA V100 or better (should also work with RTX series cards)
This repository is divided into branches:
- master = main code for the paper,
- coco = transfer learning code for MS COCO and PASCAL VOC,
- swin, pvt, mae = evaluation code for relevant baselines.
The code uses the same commandline options as the DEIT III implementation, adding additional options for elastic input. Use python main.py -h
to see all options.
Example training command:
python run_with_submitit.py --batch-size=256 --epochs=800 --bce-loss --unscale-lr --model='deit_base_patch16_LS' --input-size=448 --drop=0.0 --drop-path=0.2 --model-ema --model-ema-decay=0.99996 --opt='fusedlamb' --opt-eps=1e-08 --momentum=0.9 --weight-decay=0.05 --sched='cosine' --lr=0.003 --lr-noise-pct=0.67 --lr-noise-std=1.0 --warmup-lr=1e-06 --min-lr=1e-05 --decay-epochs=30 --warmup-epochs=5 --cooldown-epochs=10 --patience-epochs=10 --decay-rate=0.1 --color-jitter=0.3 --smoothing=0.0 --train-interpolation='bicubic' --repeated-aug --train-mode --ThreeAugment --reprob=0.0 --remode='pixel' --recount=1 --mixup=0.8 --cutmix=1.0 --mixup-prob=1.0 --mixup-switch-prob=0.5 --mixup-mode='batch' --data-path='IMAGENET_PATH' --data-set='IMNET' --eval-crop-ratio=1.0 --num_workers=10 --pin-mem --random-patches=196 --random-patch-min-size=16 --random-patch-max-size=96 --grid-patch-size=32 --ngpus=8 --nodes=1 --partition='' --grid-to-random-ratio 0.3
Scripts for automated evaluation are in the scripts/
directory