Skip to content

Scaling-up deep neural networks to improve their performance on ImageNet makes them more tolerant to adversarial attacks, but successful attacks on these models are misaligned with human perception.

License

Notifications You must be signed in to change notification settings

serre-lab/Adversarial-Alignment

Repository files navigation

Adversarial Alignment: breaking the trade-off between the strength of an attack and its relevance to human perception


Drew Linsley*, Pinyuan Feng*, Thibaut Boissin, Alekh Karkada Ashok, Thomas Fel, Stephanie Olaiya, Thomas Serre

Read our paper »

Website · Results · Model Info · Harmonization · ClickMe · Serre Lab @ Brown

Dataset

We did our experiments on ClickMe dataset, a large-scale effort for capturing feature importance maps from human participants that highlight parts that are relevant and irrelevant for recognition. We created a subset of ClickMe, one image per category, in our experiment. If you want to replicate our experiment, please put the TF-Record file in ./datasets.

Environment Setup

conda create -n adv python=3.8 -y
conda activate adv
conda install pytorch==1.13.1 torchvision==0.14.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install tensorflow==2.12.0
pip install timm==0.8.10.dev0
pip install harmonization
pip install numpy matplotlib scipy tqdm pandas

Implementations

  • You can enter the following command in Terminal
python main.py --model "resnet" --cuda 0 --spearman 1
  • Google Colab notebook
    • You can run 2 .ipynb files if you have installation issues. Please check the folder ./scripts

Images

  • There are 10 example images in ./images.
  • The images contains ImageNet images, human feature importance maps from ClickMe, and adversarial attacks for a variety of DNNs.

Models

  • In our experiment, 283 models have been tested
    • 125 PyTorch CNN models from timm library
    • 121 PyTorch ViT models from timm library
    • 15 PyTorch ViT/CNN hybrid architectures from timm library
    • 14 Tensorflow Harmonized models from harmonizatin library
    • 4 Baseline models
    • 4 models that were trained for robustness to adversarial example
  • The Top-1 ImageNet accuracy for each model refers to Hugging Face results

Citation

If you use or build on our work as part of your workflow in a scientific publication, please consider citing the official paper:

@article{linsley2023adv,
  title={Adversarial Alignment: breaking the trade-off between the strength of an attack and its relevance to human perception},
  author={Linsley, Drew and Feng, Pinyuan and Boissin, Thibaut and Ashok, Alekh Karkada and Fel, Thomas and Olaiya Stephanie and Serre, Thomas},
  year={2023}
}

If you have any questions about the paper, please contact Drew at [email protected].

Acknowledgement

This paper relies heavily on previous work from Serre Lab, notably Harmonization and ClickMe.

@article{fel2022aligning,
  title={Harmonizing the object recognition strategies of deep neural networks with humans},
  author={Fel, Thomas and Felipe, Ivan and Linsley, Drew and Serre, Thomas},
  journal={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}

@article{linsley2018learning,
  title={Learning what and where to attend},
  author={Linsley, Drew and Shiebler, Dan and Eberhardt, Sven and Serre, Thomas},
  journal={International Conference on Learning Representations (ICLR)},
  year={2019}
}

License

The code is released under MIT license

About

Scaling-up deep neural networks to improve their performance on ImageNet makes them more tolerant to adversarial attacks, but successful attacks on these models are misaligned with human perception.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published