This is a model-free algorithm that can make your video smooth. You can remove the flicker in your video, or render a fluent video using a series of keyframes.
Use FastBlend in Stable-Diffusion-Webui
- Open "Extensions" tab.
- Open "Install from URL" tab in the tab.
- Enter
https://github.com/Artiprocher/sd-webui-fastblend.git
to "URL for extension's git repository". - Press "Install" button.
- Wait for 5 seconds, and you will see the message "Installed into stable-diffusion-webui\extensions\sd-webui-fastblend. Use Installed tab to restart".
- Go to "Installed" tab, click "Check for updates", and then click "Apply and restart UI".
- You can see a
FastBlend
tab in the webui. - Enjoy the coherent and fluent videos!
Install the packages in your Python environment:
pip install gradio numpy imageio imageio[ffmpeg] opencv-python-headless tqdm cupy-cuda12x
If your CUDA version is not v11.2 ~ 11.8, please read this document and install the corressponding version of cupy.
Then run the following command to launch the independent webui:
python independent_webui.py
Install the packages as we described above.
from FastBlend.api import smooth_video, interpolate_video
# Blend
smooth_video(
video_guide = "guide_video.mp4",
video_guide_folder = None,
video_style = "style_video.mp4",
video_style_folder = None,
mode = "Fast",
window_size = 15,
batch_size = 16,
tracking_window_size = 0,
output_path = "output_folder",
fps = None,
minimum_patch_size = 5,
num_iter = 5,
guide_weight = 10.0,
initialize = "identity"
)
# Interpolate
interpolate_video(
frames_path = "frames_folder",
keyframes_path = "keyframes_folder",
output_path = "output_folder",
fps = None,
batch_size = 16,
tracking_window_size = 1,
minimum_patch_size = 15,
num_iter = 5,
guide_weight = 10.0,
initialize = "identity"
)
video_merge.mp4
- The original video is here. We only use the first 236 frames.
- Re-render each frame independently. The parameters are
- Prompt: masterpiece, best quality, anime screencap, cute, petite, long hair, black hair, blue eyes, hoodie, breasts, smile, short sleeves, hands, blue bowknot, wind, depth of field, forest, close-up,
- Negative prompt: (worst quality, low quality:1.4), monochrome, zombie, (interlocked fingers:1.2), extra arms,
- Steps: 20,
- Sampler: DPM++ 2M Karras,
- CFG scale: 7,
- Seed: 3010302656,
- Size: 768x512,
- Model hash: 4c79dd451a,
- Model: aingdiffusion_v90,
- Denoising strength: 1,
- Clip skip: 2,
- ControlNet 0: "Module: tile_resample, Model: control_v11f1e_sd15_tile [a371b31b], Weight: 0.4, Resize Mode: Crop and Resize, Low Vram: False, Threshold A: 1, Guidance Start: 0, Guidance End: 1, Pixel Perfect: True, Control Mode: Balanced",
- ControlNet 1: "Module: softedge_pidinet, Model: control_v11p_sd15_softedge [a8575a2a], Weight: 1, Resize Mode: Crop and Resize, Low Vram: False, Processor Res: 512, Guidance Start: 0, Guidance End: 1, Pixel Perfect: True, Control Mode: Balanced",
- ControlNet 2: "Module: depth_midas, Model: control_v11f1p_sd15_depth [cfd03158], Weight: 1, Resize Mode: Crop and Resize, Low Vram: False, Processor Res: 512, Guidance Start: 0, Guidance End: 1, Pixel Perfect: True, Control Mode: Balanced",
- Version: v1.6.0
- Open "FastBlend" tab. Upload the original video to "Guide video". Upload the re-rendered video to "Style video". We use the following settings:
- Inference mode: Fast
- Sliding window size: 30
- Batch size: 8
- Minimum patch size (odd number): 5
- Number of iterations: 5
- Guide weight: 10.0
- Click "Run". Wait a minute... (I tested this extension on an Nvidia RTX3060 laptop. It cost 12 minutes.)
- Now you have obtained a fluent video. Go to "Extras" to upscale it using "R-ESRGAN 4+ Anime6B".
video_interpolate.mp4
- The original video is here. We only use the frames 1108-1458. Please resize the frames to 512*512.
- Re-render the keyframes (1108, 1140, 1172, 1204, 1236, 1268, 1300, 1332, 1364, 1396, 1428, 1458) independently. The parameters are
- Prompt: masterpiece, best quality, a woman, anime, flat, red hair, short hair, simple black background, bare shoulder
- Negative prompt: easynegative
- Steps: 20,
- Sampler: DPM++ 2M Karras,
- CFG scale: 7,
- Seed: 1,
- Size: 768x768,
- Model hash: 4c79dd451a,
- Model: aingdiffusion_v90,
- Denoising strength: 0.7,
- ControlNet 0: "Module: softedge_pidinet, Model: control_v11p_sd15_softedge [a8575a2a], Weight: 1, Resize Mode: Crop and Resize, Low Vram: False, Processor Res: 512, Guidance Start: 0, Guidance End: 1, Pixel Perfect: True, Control Mode: Balanced",
- TI hashes: "easynegative: c74b4e810b03",
- Version: v1.6.0
- Open "FastBlend" tab. Click "Interpolate". Fill in the directory of original frames and the rendered keyframes. We use the following settings:
- Batch size: 8
- Tracking window size: 0
- Minimum patch size (odd number, larger is better): 15
- Number of iterations: 5
- Guide weight: 10
- NNF initialization: identity
- Click "Run". Wait a minute... (I tested this extension on an Nvidia RTX3060 laptop. It cost 3 minutes.)
- Now you have obtained a fluent video. Go to "Extras" to upscale it using "R-ESRGAN 4+ Anime6B".
video_codef.mp4
We found an interesting project called CoDeF, which uses only one keyframe to render a video. We collected some videos from their project page and compared FastBlend with CoDeF. For each video, we select one keyframe from the rendered video and use this keyframe to rerender the video in interpolation mode. The parameters are
- Batch size: 48
- Tracking Window Size: 1
- Minimum patch size (odd number, larger is better): 25
- Number of iterations: 5
- Guide weight: 10
- NNF initialization: identity
As we can see, FastBlend is competitive with CoDeF. What's more, FastBlend is very efficiency. We only need one minute to render a video clip!
- Output directory: the directory to save the video.
- Inference mode
Mode | Time | Memory | Quality | Frame by frame output | Description |
---|---|---|---|---|---|
Fast | ■ | ■■■ | ■■ | No | Blend the frames using a tree-like data structure, which requires much RAM but is fast. |
Balanced | ■■ | ■ | ■■ | Yes | Blend the frames naively. |
Accurate | ■■■ | ■ | ■■■ | Yes | Blend the frames and align them together for higher video quality. When [batch size] >= [sliding window size] * 2 + 1, the performance is the best. |
- Sliding window size: our algorithm will blend the frames in a sliding windows. If the size is n, each frame will be blended with the last n frames and the next n frames. A large sliding window can make the video fluent but sometimes smoggy.
- Batch size: a larger batch size makes the program faster but requires more VRAM.
- Tracking window size (only for accurate mode): The size of window in which our algorithm tracks moving objects. Empirically, 1 is enough.
- Advanced settings
- Minimum patch size (odd number): the minimum patch size used for patch matching. (Default: 5)
- Number of iterations: the number of iterations of patch matching. (Default: 5)
- Guide weight: a parameter that determines how much motion feature applied to the style video. (Default: 10)
- NNF initialization: how to initialize the NNF (Nearest Neighbor Field). (Default: identity)
- Output directory: the directory to save the video.
- Batch size: a larger batch size makes the program faster but requires more VRAM.
- Tracking window size (only for accurate mode): The size of window in which our algorithm tracks moving objects. Empirically, 1 is enough.
- Advanced settings
- Minimum patch size (odd number): the minimum patch size used for patch matching. This parameter should be larger than that in blending. (Default: 15)
- Number of iterations: the number of iterations of patch matching. (Default: 5)
- Guide weight: a parameter that determines how much motion feature applied to the style video. (Default: 10)
- NNF initialization: how to initialize the NNF (Nearest Neighbor Field). (Default: identity)
@article{duan2023fastblend,
title={FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier},
author={Duan, Zhongjie and Wang, Chengyu and Chen, Cen and Qian, Weining and Huang, Jun and Jin, Mingyi},
journal={arXiv preprint arXiv:2311.09265},
year={2023}
}