ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [CVPR 2024 Highlight]

Chenshuang Zhang · Fei Pan · Junmo Kim · In So Kweon · Chengzhi Mao

We establish a novel benchmark, ImageNet-D, using generative models to test visual perception robustness, surpassing previous synthetic test sets with more varied and realistic synthetic images. By employing diffusion models, ImageNet-D results in a significant accuracy drop to a range of vision models from the standard ResNet visual classifier to the latest foundation models like CLIP and MiniGPT-4 , significantly reducing their accuracy by up to 60%.

Dataset

The complete dataset is accessible via both Huggingface and Google Drive. Only one of these two links is needed to download the entire dataset. Choose the method that works best for you.

Download from Google drive link, then unzip the tar file with tar -xvf ImageNet-D.tar.

Or:

Download from Huggingface: git lfs clone https://huggingface.co/datasets/zcs15/ImageNet-D.

We organize all images into three folders according to its attributes, backgroud, texture, and material. The default dataset directory in the evaluation code is ./data/ImageNet-D/, and you may change to your own directory. For evaluation of large VQA models like LLaVa, we attach the questions for each image in questions folder.

├── ImageNet-D
    ├── background
    ├── texture
    ├── material
    └── questions
        ├── background.csv
        ├── texture.csv 
        └── material.csv

We show some image examples from ImageNet-D as follows. Each group of images is generated with the same object and nuisance, such as background, texture, and material.

Installation

conda create -n imagenet_d python=3.8.16 -y
conda activate imagenet_d
pip install -r requirements.txt

Evaluate vision models pretrained on ImageNet-1K

Run python evaluate_imagenet_models.py --model "vgg19"

Evaluate vision-language models like CLIP

Run python evaluate_vlm.py --model "ViT-B/16"

Evaluate VQA models like LLaVa

Here, we provide the evaluation code of LLaVa for example. Other large VQA models can be evaluate in a similar way.

To evaluate LLaVa, first install the packages following original LLaVa model as follows.

cd LLaVA

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

pip install flash-attn --no-build-isolation

To generate the answers by LLaVa, run the following command and specify the test subset.

python -m llava.serve.eval_imagenet_d \
     --model-path ./pretrained_weights/llava-v1.5-13b/ \
     --experiment_name 'background' \

Compute the accuracy by running the following command.

python compute_accuracy.py

Acknowledgement

This repository is built upon the code of GenInt and LLaVa.

BibTeX

If you find our work useful, please consider citing as follows.

@article{zhang2024imagenet_d,
  author    = {Zhang, Chenshuang and Pan, Fei and Kim, Junmo and Kweon, In So and Mao, Chengzhi},
  title     = {ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object},
  journal   = {CVPR},
  year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LLaVA		LLaVA
images		images
nuisances		nuisances
preprocessing		preprocessing
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate_imagenet_models.py		evaluate_imagenet_models.py
evaluate_vlm.py		evaluate_vlm.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [CVPR 2024 Highlight]

Dataset

Installation

Evaluate vision models pretrained on ImageNet-1K

Evaluate vision-language models like CLIP

Evaluate VQA models like LLaVa

Acknowledgement

BibTeX

About

Releases

Packages

Contributors 2

Languages

License

chenshuang-zhang/imagenet_d

Folders and files

Latest commit

History

Repository files navigation

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [CVPR 2024 Highlight]

Dataset

Installation

Evaluate vision models pretrained on ImageNet-1K

Evaluate vision-language models like CLIP

Evaluate VQA models like LLaVa

Acknowledgement

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages