STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Rui Xie^1*, Yinhong Liu^1*, Penghao Zhou², Chen Zhao¹, Jun Zhou³
Kai Zhang¹, Zhenyu Zhang¹, Jian Yang¹, Zhenheng Yang², Ying Tai^1†

¹Nanjing University, ²ByteDance, ³Southwest University

🔆 Updates

2025.01.09 The online demo of STAR is now live! Please note that due to the duration limitation of ZeroGPU, the running time may exceed the allocated GPU duration. If you'd like to try it, you can duplicate the demo and assign a paid GPU.
2025.01.07 The pretrained STAR model (I2VGen-XL and CogVideoX-5B versions) and inference code have been released.

📑 TODO

Inference codes
Online demo
Training codes

🔎 Method Overview

📷 Results Display

👀 More visual results can be found in our Project Page and Video Demo.

⚙️ Dependencies and Installation

## git clone this repository
git clone https://github.com/NJU-PCALab/STAR.git
cd STAR

## create an environment
conda create -n star python=3.10
conda activate star
pip install -r requirements.txt
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

🚀 Inference

Model Weight

Base Model	Type	URL
I2VGen-XL	Light Degradation	🔗
I2VGen-XL	Heavy Degradation	🔗
CogVideoX-5B	Heavy Degradation	🔗

1. I2VGen-XL-based

Step 1: Download the pretrained model STAR from HuggingFace.

We provide two verisions for I2VGen-XL-based model, heavy_deg.pt for heavy degraded videos and light_deg.pt for light degraded videos (e.g., the low-resolution video downloaded from video websites).

You can put the weight into pretrained_weight/.

Step 2: Prepare testing data

You can put the testing videos in the input/video/.

As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt using Pllava. 3. Manually write the prompt. You can put the txt file in the input/text/.

Step 3: Change the path

You need to change the paths in video_super_resolution/scripts/inference_sr.sh to your local corresponding paths, including video_folder_path, txt_file_path, model_path, and save_dir.

Step 4: Running inference command

bash video_super_resolution/scripts/inference_sr.sh

If you encounter an OOM problem, you can set a smaller frame_length in inference_sr.sh.

2. CogVideoX-based

Refer to these instructions for inference with the CogVideX-5B-based model.

Please note that the CogVideX-5B-based model supports only 720x480 input.

❤️ Acknowledgments

This project is based on I2VGen-XL, VEnhancer, CogVideoX and OpenVid-1M. Thanks for their awesome works.

🎓Citations

If our project helps your research or work, please consider citing our paper:

@misc{xie2025starspatialtemporalaugmentationtexttovideo,
      title={STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution}, 
      author={Rui Xie and Yinhong Liu and Penghao Zhou and Chen Zhao and Jun Zhou and Kai Zhang and Zhenyu Zhang and Jian Yang and Zhenheng Yang and Ying Tai},
      year={2025},
      eprint={2501.02976},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.02976}, 
}

📧 Contact

If you have any inquiries, please don't hesitate to reach out via email at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
assets		assets
cogvideox-based		cogvideox-based
input		input
video_super_resolution		video_super_resolution
video_to_video		video_to_video
README.md		README.md
inference_utils.py		inference_utils.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

🔆 Updates

📑 TODO

🔎 Method Overview

📷 Results Display

⚙️ Dependencies and Installation

🚀 Inference

Model Weight

1. I2VGen-XL-based

Step 1: Download the pretrained model STAR from HuggingFace.

Step 2: Prepare testing data

Step 3: Change the path

Step 4: Running inference command

2. CogVideoX-based

❤️ Acknowledgments

🎓Citations

📧 Contact

About

Releases

Packages

Languages

AstroRoh2/STAR

Folders and files

Latest commit

History

Repository files navigation

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

🔆 Updates

📑 TODO

🔎 Method Overview

📷 Results Display

⚙️ Dependencies and Installation

🚀 Inference

Model Weight

1. I2VGen-XL-based

Step 1: Download the pretrained model STAR from HuggingFace.

Step 2: Prepare testing data

Step 3: Change the path

Step 4: Running inference command

2. CogVideoX-based

❤️ Acknowledgments

🎓Citations

📧 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages