[ArXiv] | [🤗HuggingFace] | [Website]
🌟 Any contributions via PRs, issues, emails or other methods are greatly appreciated.
-
🎖️ Our work is accepted by ACL2024.
-
🔥 We have release benchmark on [🤗HuggingFace].
-
🔥 The paper is also available on [ArXiv].
-
🔮 Interactive benchmark website & more exploration are available on [https://lightchen233.github.io/m3cot.github.io/].
Multi-modal Chain-of-Thought (MCoT) requires models to leverage knowledge from both textual and visual modalities for step-by-step reasoning, which gains increasing attention. Nevertheless, the current MCoT benchmark still faces some challenges: (1) absence of visual modal reasoning, (2) single-step visual modal reasoning, and (3) Domain missing, thereby hindering the development of MCoT. Motivated by this, we introduce a novel benchmark (M3CoT) to address the above challenges, advancing the multi-domain, multi-step, and multi-modal CoT. Additionally, we conduct a thorough evaluation involving abundant MCoT approaches on Vision Large Language Models (VLLMs). In addition, we highlight that the current VLLMs still struggle to correctly reason in M3CoT and there remains a large gap between existing VLLMs and human performance in M3CoT, despite their superior results on previous MCoT benchmarks. To our knowledge, we take the first meaningful step toward the multi-domain, multi-step, and multi-modal scenario in MCoT. We hope that M3CoT can serve as a valuable resource, providing a pioneering foundation in multi-domain, multi-step, multi-modal chain-of-thought research.
import datasets
dataset = datasets.load_dataset("LightChen2333/M3CoT")
Please download the corresponding data set from Here and place the unzipped content in the data
folder.
import datasets
dataset = datasets.load_dataset("data/m3cot.py")
In addition, we also hope that you will use our M3CoT class to better manage and analyze data. Our class supports two initialization formats:
import datasets
from utils.data import M3CoT
dataset = datasets.load_dataset("data/m3cot.py")
prepared_dataset = M3CoT(dataset=dataset)
And
from utils.data import M3CoT
prepared_dataset = M3CoT(data_path="data")
M3CoT requires Python>=3.10
, and torch>=2.0
.
git clone https://github.com/LightChen233/M3CoT.git && cd M3CoT/
pip install -r requirements.txt
python evaluate.py --setting zero-shot \
--model gpt4v \
--prompt cot \
--metric_by topic
where --setting
can be selected from [zero-shot, few-shot, tool-usage]
. --metric_by
can be selected from [topic, domain, all]
For zero-shot
setting:
--model
can be selected from[kosmos-2, cogvlm, gemini, gpt4v, instruct-blip-7b, instruct-blip-13b, llava-7b, llava-13b, openflamingo]
--prompt
can be selected from[direct, cot, ccot, dsp]
python evaluate.py --setting custom \
--metric_path [JSONL_PATH]
Among them, each line of file in jsonl
must meet the following format:
{
"id": "[ID]",
"choices": ["[CHOICE1]", "[CHOICE2]", ...],
"answer": "A/B/C/...",
"domain": "[DOMAIN]",
"topic": "[TOPIC]",
"messages": [
"[QUESTION]",
"[ANSWER]"
]
}
root
├── data # data folder where the dataset is loaded
├── experiment # All experimental data
│ ├── zero-shot # Experimental results under zero-shot setting. Subfolders are for each model, and each model folder contains the results of three prompts.
│ ├── few-shot # Experimental results under few-shot setting.
│ └── tool-usage # Experimental results under tool-usage setting.
├── utils # Tool library folder
│ ├── common_tool.py # Some common utility functions
│ ├── data.py # Dataset loading class
│ ├── gemini_request.py # Gemini request tool
│ ├── image_tool.py # Image processing function.
│ └── metric.py # Indicator calculation tool.
├── scripts
│ ├── load_dataset.py # Example script to load a dataset
│ └── parse_to_sqa_format.py # Convert dataset to ScienceQA format
└── evaluate.py # Evaluation script
If you find this project useful for your research, please consider citing the following paper:
@inproceedings{chen-etal-2024-m3cot,
title = "M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought",
author = "Chen, Qiguang and
Qin, Libo and
Zhang, Jin and
Chen, Zhi and
Xu, Xiao and
Che, Wanxiang",
booktitle = "Proc. of ACL",
year = "2024",
}
Please create Github issues here or email Qiguang Chen if you have any questions or suggestions.