This is the official repository for the ICCV 2023 4th Workshop on e-Heritage paper: Diffusion Based Augmentation for captioning and retrieval in Cultural Heritage Dario Cioni, Lorenzo Berlincioni, Federico Becattini, Alberto del Bimbo
If you find our work useful, we welcome citations:
@InProceedings{Cioni_2023_ICCV,
author = {Cioni, Dario and Berlincioni, Lorenzo and Becattini, Federico and Del Bimbo, Alberto},
title = {Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2023},
pages = {1707-1716}
}
Here is the description of the main files and folders of the project.
cultural-heritage-image2text/
│
├── main.py - main script for training and testing models
│
├── data_loader/ - anything about data loading goes here
│ └── artpedia.py contains Artpedia Dataset and DataModule
│
├── data/ - default directory for storing input data
│
├── model/ - models and metrics
│ ├── model.py - LightningModule wrapper for image captioning
│ └── metrics/ directory with custom metrics
│
├── runs/
│ ├── cultural-heritage/ - trained models are saved here
│ └── wandb/ - local logdir for wandb and logging output
│
└── utils/
├── utils.py - small utility functions for training
└── download.py - utility to download images from Artpedia json metadata
Experiments were performed on the Artpedia and ArtCap datasets. Images were downloaded from Wikipedia using the download.py script. To download the images, run the following command, providing a valid identifier, the annotation file and the output directory.
python utils/download.py [email protected] --ann_file data/artpedia/artpedia.json --img_dir data/artpedia/images
This project uses a modified version of pycocoevalcap. To install it, run the following command:
git submodule add --init
git submodule update --remote
cd pycocoevalcap
pip install -e .
Command line interface is implemented using LightningCLI.
The setup during training and validation is controlled by a configuration file. The configuration file is a YAML file with the following structure:
# lightning.pytorch==2.0.1.post0
seed_everything: int | bool
trainer:
# list of trainer args
logger:
class_path: lightning.pytorch.loggers.WandbLogger
init_args:
# wandb logging args
callbacks:
class_path: callbacks.predictions.LogPredictionSamplesCallback
model:
model_name_or_path: microsoft/git-base
learning_rate: 5.0e-05
warmup_steps: 500
weight_decay: 0.0
metrics:
# add or remove metrics here
- class_path: model.CocoScore
- class_path: torchmetrics.text.BERTScore
init_args:
model_name_or_path: distilbert-base-uncased
batch_size: 16
lang: en
max_length: 512
generation:
# generation args
data:
img_dir: data/artpedia/
ann_file: data/artpedia/artpedia_augmented.json
batch_size: 2
# Processor name for model
model_name_or_path: microsoft/git-base
num_workers: 6
ckpt_path: null # provide a path to a checkpoint to load
Every configuration can be overridden by passing a command line argument with the same name. For example, to override the batch_size
parameter, you can run:
python main.py fit --config configs/config.yaml --data.batch_size 32
You can find a complete example of a configuration file in configs/ folder.
The dataset augmentation is performed using the img2img.py script. The script uses the Automatic1111 API. To use the script, you need to provide an URL to the API. The script takes as input the path to the dataset annotation file, the path to the original dataset images and the output path for the augmented dataset.
For each image in the dataset, the script generates a new folder with the same name as the image, containing augmented images.
python img2img.py --api_url http://127.0.0.1:7860 --ann_file data/artpedia/artpedia.json --img_dir data/artpedia/images --out_dir data/artpedia/samples
Training is performed using the fit
subcommand, followed by the path to the configuration file and other optional arguments.
python main.py fit -c configs/your_config.yaml
Test is performed using the test
command, followed by the path to the configuration file and other optional arguments.
python main.py test -c configs/config.yaml --ckpt_path path/to/ckpt.ckpt
Here are the performance of the pretrained models on the Artpedia and ArtCap dataset. For additional results, please refer to the paper.