Probing the Mid-level Vision Capabilities of Self-supervised Learning Methods

This repository contains official implementation of the code for the paper Probing the Mid-level Vision Capabilities of Self-Supervised Learning which presents an analysis of the mid level perception of pretrained SSLs.

Xuweiyi Chen, Markus Marks, Zezhou Cheng

If you find this code useful, please consider citing:

@article{chen2024probingmidlevelvisioncapabilities,
      title={Probing the Mid-level Vision Capabilities of Self-Supervised Learning}, 
      author={Xuweiyi Chen and Markus Marks and Zezhou Cheng},
      year={2024},
      eprint={2411.17474},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17474}, 
}

⚠️ Note: This is a cleanup version. Further edits and refinements are in progress. This note will be removed once the content has been finalized.

Model Checkpoints

Model Name	Backbone	Dataset	Source Link
Jigsaw	ResNet-50	ImageNet-1K	VISSL model zoo
RotNet	ResNet-50	ImageNet-1K	VISSL model zoo
NPID	ResNet-50	ImageNet-1K	VISSL model zoo
SeLa-v2	ResNet-50	ImageNet-1K	SwAV repository
NPID++	ResNet-50	ImageNet-1K	VISSL model zoo
PIRL	ResNet-50	ImageNet-1K	VISSL model zoo
ClusterFit	ResNet-50	ImageNet-1K	VISSL model zoo
DeepCluster-v2	ResNet-50	ImageNet-1K	SwAV repository
SwAV	ResNet-50	ImageNet-1K	SwAV repository
SimCLR	ResNet-50	ImageNet-1K	VISSL model zoo
MoCo v2	ResNet-50	ImageNet-1K	MoCo v2 repository
SimSiam	ResNet-50	ImageNet-1K	MMSelfSup model zoo
BYOL	ResNet-50	ImageNet-1K	Unofficial BYOL repo
Barlow Twins	ResNet-50	ImageNet-1K	MMSelfSup model zoo
DenseCL	ResNet-50	ImageNet-1K	DenseCL repository
DINO	ResNet-50/ViT-B/16	ImageNet-1K	DINO repository
MoCo v3	ResNet-50/ViT-B/16	ImageNet-1K	MoCo v3 repository
iBOT	ViT-B/16	ImageNet-1K	iBOT repository
MAE	ViT-B/16	ImageNet-1K	MAE repository
MaskFeat	ViT-B/16	ImageNet-1K	MMSelfSup model zoo

Environment Setup

We recommend using Anaconda or Miniconda. To setup the environment, follow the instructions below.

conda create -n mid-probe python=3.9 --yes
conda activate mid-probe
conda install pytorch=2.2.1 torchvision=0.17.1 pytorch-cuda=12.1 -c pytorch -c nvidia 
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
conda install -c conda-forge nb_conda_kernels=2.3.1

pip install -r requirements.txt
python setup.py develop

pip install protobuf==3.20.3 
pre-commit install

Finally, please follow the dataset download and preprocessing instructions here.

Evaluation Experiments

We provide code to train the depth probes and evaluate the correspondence. All experiments use hydra configs which can be found here. Below are example commands for running the evaluations with the DINO ViT-B/16 backbone.

python train_depth.py backbone=dino_b16 +backbone.return_multilayer=True dataset=nyu
python train_snorm.py backbone=dino_b16 +backbone.return_multilayer=True dataset=nyu
python train_generic_objectness.py backbone=dino_b16 dataset=voc12
python evaluate_model_percepture.py backbone=dino_b16 experiment_model=dino_b16 system.random_seed=8 system.num_gpus=1 batch_size=8 dataset=twoafcdataset output_dir=<OUTPUT_PATH> backbone.return_cls=True

python evaluate_navi_correspondence.py +backbone=dino_b16
python evaluate_scannet_correspondence.py +backbone=dino_b16

Obtabin Visualization

python train_depth.py backbone=beit_v2_vitb16 +backbone.return_multilayer=True experiment_model=depth_beitv2_vitb16 system.port=12345 system.random_seed=10 system.num_gpus=1 batch_size=8 is_eval=true ckpt_path=<PATH_TO_CKPT>

Acknowledgments

We would also like to acknowledge the following repositories and users for releasing very valuable code and datasets:

GeoNet for releasing the extracted surface normals for full NYU.
Probe3D for releasing probing algorithms for 3D foundation models.
Comparing evaluation protocols for self-supervised pre-training with image classification for releasing a collection of Self-Supervised Learning methods and their usages.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
data_processing		data_processing
evals		evals
launch_script		launch_script
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
evaluate_generic_objectness.py		evaluate_generic_objectness.py
evaluate_model_percepture.py		evaluate_model_percepture.py
evaluate_navi_correspondence.py		evaluate_navi_correspondence.py
evaluate_spair_correspondence.py		evaluate_spair_correspondence.py
render_navi_correspondence.py		render_navi_correspondence.py
render_scannet_correspondence.py		render_scannet_correspondence.py
requirements.txt		requirements.txt
train_depth.py		train_depth.py
train_generic_objectness.py		train_generic_objectness.py
train_snorm.py		train_snorm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probing the Mid-level Vision Capabilities of Self-supervised Learning Methods

Model Checkpoints

Environment Setup

Evaluation Experiments

Obtabin Visualization

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

UVA-Computer-Vision-Lab/midvision-probe

Folders and files

Latest commit

History

Repository files navigation

Probing the Mid-level Vision Capabilities of Self-supervised Learning Methods

Model Checkpoints

Environment Setup

Evaluation Experiments

Obtabin Visualization

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages