🎯🎯 AIGCBench is a novel and comprehensive benchmark designed for evaluating the capabilities of state-of-the-art video generation algorithms. Official code for the paper:
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI, BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench).
Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan
Illustration of our AIGCBench. Our AIGCBench is divided into three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed.
Key Features of AIGCBench:
- Diverse Datasets: AIGCBench incorporates a variety of datasets, including real-world video-text pairs and image-text pairs, to ensure a broad and realistic evaluation spectrum. Additionally, it includes a newly generated dataset created through an innovative text-to-image generation pipeline, enhancing the diversity and representativeness of the benchmark.
- Extensive Evaluation Metrics: AIGCBench introduces a set of evaluation metrics that cover four crucial dimensions of video generation—control-video alignment, motion effects, temporal consistency, and video quality. Our evaluation metrics encompass both reference video-based metrics and video-free metrics.
- Validated by Human Judgment: The benchmark's evaluation criteria are thoroughly verified against human preferences to confirm their reliability and alignment with human judgments.
- In-Depth Analysis: Through extensive evaluations, AIGCBench reveals insightful findings about the current strengths and limitations of existing I2V models, offering valuable guidance for future advancements in the field.
- Future Expansion: AIGCBench is not only comprehensive and scalable in its current form but also designed with the vision to encompass a wider range of video generation tasks in the future. This will allow for a unified and in-depth benchmarking of various aspects of AI-generated content (AIGC), setting a new standard for the evaluation of video generation technologies.
- [01/24/2024] Our paper has been accepted by BenchCouncil Transactions on Benchmarks, Standards and Evaluations (Tbench)!
- [01/10/2024] The evaluation dataset and evaluation code have been released.
😄The Hugging Face link for our dataset.
This dataset is intended for the evaluation of video generation tasks. Our dataset includes image-text pairs and video-text pairs. The dataset comprises three parts:
Ours
- A custom generation of image-text samples.Webvid val
- A subset of 1000 video samples from the WebVid val dataset.Laion-aesthetics
- A subset of LAION dataset that includes 925 curated image-text samples.
Below are some images we generated, with the corresponding text:
We have encapsulated the evaluation metrics used in our paper in eval.py
; for more details, please refer to the paper. To use the code, please first download the clip model file and replace the 'path_to_dir' with the actual path.
Below is a simple example:
batch_video_path = os.path.join('path_to_videos', '*.mp4')
video_path_list = sorted(glob.glob(batch_video_path))
sum_res = 0
cnt = 0
for video_path in video_path_list:
res = compute_video_video_similarity(ref_video_path, video_path)
sum_res += res['clip']
cnt += res["state"]
print(sum_res / cnt)
Quantitative analysis for different Image-to-Video algorithms. An upward arrow indicates that higher values are better, while a downward arrow means lower values are preferable.
Dimensions | Metrics | VideoCrafter | I2VGen-XL | SVD | Pika | Gen2 |
---|---|---|---|---|---|---|
Control-video Alignment | MSE (First) ↓ | 3929.65 | 4491.90 | 640.75 | 155.30 | 235.53 |
SSIM (First) ↑ | 0.300 | 0.354 | 0.612 | 0.800 | 0.803 | |
Image-GenVideo Clip ↑ | 0.830 | 0.832 | 0.919 | 0.930 | 0.939 | |
GenVideo-Text Clip ↑ | 0.23 | 0.24 | - | 0.271 | 0.270 | |
GenVideo-RefVideo CliP (Keyframes) ↑ | 0.763 | 0.764 | - | 0.824 | 0.820 | |
Motion Effects | Flow-Square-Mean | 1.24 | 1.80 | 2.52 | 0.281 | 1.18 |
GenVideo-RefVideo CliP (Corresponding frames) ↑ | 0.764 | 0.764 | 0.796 | 0.823 | 0.818 | |
Temporal Consistency | GenVideo Clip (Adjacent frames) ↑ | 0.980 | 0.971 | 0.974 | 0.996 | 0.995 |
GenVideo-RefVideo CliP (Corresponding frames) ↑ | 0.764 | 0.764 | 0.796 | 0.823 | 0.818 | |
Video Quality | Frame Count ↑ | 16 | 32 | 25 | 72 | 96 |
DOVER ↑ | 0.518 | 0.510 | 0.623 | 0.715 | 0.775 | |
GenVideo-RefVideo SSIM ↑ | 0.367 | 0.304 | 0.507 | 0.560 | 0.504 |
To validate the alignment of our proposed evaluation standards with human preferences, we conducted a study. We randomly selected 30 generated results from each of the five methods. Then, we asked participants to vote on the best algorithm outcomes across four dimensions: Image Fidelity, Motion Effects, Temporal Consistency, and Video Quality. A total of 42 individuals participated in the voting process. The specific results of the study are presented below:
📧 If you have any questions, please feel free to contact us via email at [email protected] and [email protected].
If you find our work useful in your research, please consider citing our paper:
@article{fan2024aigcbench,
title={AIGCBench: Comprehensive evaluation of image-to-video content generated by AI},
author={Fan, Fanda and Luo, Chunjie and Gao, Wanling and Zhan, Jianfeng},
journal={BenchCouncil Transactions on Benchmarks, Standards and Evaluations},
pages={100152},
year={2024},
publisher={Elsevier}
}