CCMB and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework

🔥🔥🔥 CCMB: A Large-scale Chinese Cross-modal Benchmark (ACM MM 2023)

This repo is the official implementation of CCMB and R2D2.

CCMB is available. It include pre-train dataset (Zero) and 5 downstream datasets. The detailed introduction and download URL are in http://zero.so.com. The 250M data is in https://pan.baidu.com/s/1gnNbjOdCQdqZ4bRNN1S-Vw?pwd=iau8.

R2D2 is a vision-language framework. We release the following code and models:

✅Pre-trained checkpoints.

✅Inference demo.

✅Fine-tuning code and checkpoints for Image-Text Retrieval and Image-Text Matching tasks.

Performance

We show the performance of R2D2_ViT-L fine-tuned on Flickr30k-CNA dataset. The output of R2D2 is a similarity score between 0 and 1.

中文 (English)	乔丹投篮 (Jordan shot)	乔丹运球 (Jordan dribble)	詹姆斯投篮 (James shot)
Similarity score	0.99033021	0.91078649	0.61231128

Requirements

pip install -r requirements.txt

Pre-trained checkpoints

Pre-trained image-text pairs	R2D2_ViT-L	PRD2_ViT-L
250M	Download	Download
23M	Download	-

Fine-tuned checkpoints

Dataset	R2D2_ViT-B(23M)
Flickr-CNA	Download
IQR	Download
ICR	Download
IQM	Download
ICM	Download

Inference demo

To evaluate the pretrained R2D2 model on image-text pairs, run:
```
python r2d2_inference_demo.py
```
To evaluate the pretrained PRD2 model on image-text pairs, run:
```
python prd2_inference_demo.py
```

Downstream Tasks

Download datasets and pretrained models. for ICR, IQR, ICM, IQM tasks, after downloading you should see the following folder structure:

├── IQR_IQM_ICR_ICM_images
│   
├── IQR
│   ├── train
│   └── val
├── ICR
│   ├── train
│   └── val
├── IQM
│   ├── train
│   └── val
│── ICM
│   ├── train
│   └── val
for Flickr30k-CNA, after downloading you should see the following folder structure:

├── Flickr30k-images │
├── train │
├── val │
└── test

In config/retrieval_*.yaml, set the paths for the dataset and pretrain model paths.
Run fine-tuning for the Image-Text Retrieval task.
```
sh train_r2d2_retrieval.sh
```
Run fine-tuning for the Image-Text Matching task.
```
sh train_r2d2_matching.sh
```

Citation

If you find this dataset and code useful for your research, please consider citing.

@inproceedings{xie2023ccmb,
  title={CCMB: A Large-scale Chinese Cross-modal Benchmark},
  author={Xie, Chunyu and Cai, Heng and Li, Jincheng and Kong, Fanjing and Wu, Xiaoyu and Song, Jianfei and Morimitsu, Henrique and Yao, Lin and Wang, Dexin and Zhang, Xiangzheng and others},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={4219--4227},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCMB and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework

Performance

Requirements

Pre-trained checkpoints

Fine-tuned checkpoints

Inference demo

Downstream Tasks

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
checkpoints		checkpoints
config		config
data		data
image		image
models		models
solver		solver
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prd2_inference_demo.py		prd2_inference_demo.py
r2d2_inference_demo.py		r2d2_inference_demo.py
requirements.txt		requirements.txt
train_r2d2_matching.py		train_r2d2_matching.py
train_r2d2_matching.sh		train_r2d2_matching.sh
train_r2d2_retrieval.py		train_r2d2_retrieval.py
train_r2d2_retrieval.sh		train_r2d2_retrieval.sh
utils.py		utils.py

License

yuxie11/R2D2

Folders and files

Latest commit

History

Repository files navigation

CCMB and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework

Performance

Requirements

Pre-trained checkpoints

Fine-tuned checkpoints

Inference demo

Downstream Tasks

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages