PyTorch implementation for NeurIPS' 2022 paper of “A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval”. It is built on top of the SGRAF in PyTorch.
Dataset | Model (+DAA) | image-to-text | text-to-image | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | PMRP | ASP | R@1 | R@5 | R@10 | PMRP | ASP | ||
Flickr30K | SAF | 73.9 | 93.0 | 96.2 | - | 65.0 | 56.9 | 81.9 | 87.9 | - | 58.4 |
SGR | 73.8 | 92.9 | 96.3 | - | 65.5 | 56.6 | 80.7 | 84.9 | - | 59.0 | |
SGRAF | 78.0 | 94.2 | 97.6 | - | 65.8 | 59.9 | 83.4 | 89.2 | - | 59.2 | |
MSCOCO 1K | SAF | 78.0 | 95.6 | 98.4 | 47.1 | 67.2 | 62.8 | 89.8 | 95.2 | 48.7 | 61.6 |
SGR | 78.0 | 95.8 | 98.6 | 46.4 | 68.5 | 62.6 | 88.8 | 93.7 | 48.6 | 62.8 | |
SGRAF | 80.2 | 96.4 | 98.8 | 48.1 | 68.3 | 65.0 | 90.7 | 95.8 | 49.6 | 62.7 | |
MSCOCO 5K | SAF | 56.2 | 83.5 | 90.9 | 35.7 | 67.0 | 40.5 | 70.1 | 80.7 | 36.6 | 61.4 |
SGR | 56.5 | 84.1 | 91.1 | 35.3 | 68.4 | 40.8 | 70.2 | 80.4 | 36.9 | 62.6 | |
SGRAF | 60.0 | 86.4 | 92.4 | 36.6 | 68.2 | 43.5 | 72.3 | 82.5 | 37.5 | 62.5 |
We recommended the following dependencies.
- Python 3.8
- PyTorch 1.10.0
- NumPy (>1.19.5)
- TensorBoard
If you don't want to train from scratch, you can download the pretrained model from here (SGR for MS-COCO model), here (SAF for MS-COCO model), here (SGR for Flickr30K model) and here (SAF for Flickr30K model).
We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:
wget https://iudata.blob.core.windows.net/scan/data.zip
wget https://iudata.blob.core.windows.net/scan/vocab.zip
Another download link is available below:
https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC
To speed up dataset loading, we convert these features from numpy.array to HDF5 file. Modify the data_path in np2h.py
and then run np2h.py
:
python np2h.py
Modify the data_path, vocab_path, model_name, logger_name in the opts.py
file. Then run train.py
:
For MSCOCO:
(For SGR) python train.py --data_name coco_precomp --num_epochs 30 --learning_rate 0.00015 --lr_update 20 --world_size 4 --module_name SGR --daa_weight 25
(For SAF) python train.py --data_name coco_precomp --num_epochs 30 --learning_rate 0.00015 --lr_update 20 --world_size 4 --module_name SAF --daa_weight 25
For Flickr30K:
(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --learning_rate 0.0006 --lr_update 30 --world_size 1 --module_name SGR --daa_weight 10
(For SAF) python train.py --data_name f30k_precomp --num_epochs 40 --learning_rate 0.0006 --lr_update 20 --world_size 1 --module_name SAF --daa_weight 10
To do cross-validation on MSCOCO, pass fold5=True
with a model trained using --data_name coco_precomp
.
python evaluation.py
To test on Flickr30K, pass fold5=False
with a model trained using --data_name f30k_precomp
.
python evaluation.py
PMRP is a metric to evaluate the diversity of model. More details could find in PCME.
Before evalution, you should download the captions_val2014.json and instances_val2014.json from here, or you can find them from here.
Then put them in the path pmrp_com/coco_ann.
To compute pmrp score on MSCOCO-1K, you can run:
python pmrp_evaluation.py --path1 ${SIM_MATRIX} --n_fold 5
To compute pmrp score on MSCOCO-5K, you can run:
python pmrp_evaluation.py --path1 ${SIM_MATRIX} --n_fold 0
${SIM_MATRIX}
is the path of npy format similarity matrix with the shape of (5000, 25000) produced by the models. If you want to compute PMRP score of SGRAF (integration of SGR and SAF), add --path2 ${SIM_MATRIX}
as the prediction of another model.
If you found this code useful, please cite the following paper:
@inproceedings{li2022differentiable,
author = {Hao Li and
Jingkuan Song and
Lianli Gao and
Pengpeng Zeng and
Haonan Zhang and
Gongfu Li},
title = {A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval},
booktitle = {NeurIPS},
year = {2022}
}