Skip to content

Description: Frequency Augmented Variational Autoencoder for better Image Reconstruction

License

Notifications You must be signed in to change notification settings

oppo-us-research/FA-VAE

Repository files navigation

Frequency Augmented VAE (FA-VAE)

This is the original implementation for the paper "Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder" published in CVPR 2023.

FA-VAE is a model that reconstructs images through improving alignment on the frequency spectrums between the original and reconstructed images.

drawing

To-Do

We will be releasing the checkpoints shortly.

Requirements

The packages needed are in environment.yaml for reference.

Checkpoints

Model Link
FA-VAE on CelebA-HQ (Table 2 row 8, FCM (Res) + non pair-wise DSL) expe_5.pt
FA-VAE on FFHQ (Table 1 row 3) favae-ffhq.pt
FA-VAE on ImageNet (f=16) (Table 1 last row) favae-imagenet-f16.pt
FA-VAE on ImageNet (f=4) (Table 1 row 6) favae-imagenet-f4.pt
CAT on CelebA-HQ cat_celeba.pt

Data Preparation

CelebA-HQ

  1. Download the dataset:

    • CelebA-HQ dataset can be downloaded from CelebA-Mask-HQ.

    • The train test split is in the file list_eval_partition.txt, in CelebA where "0" is train, "1" is eval, and "2" is test.

    • Download the captions from MM-CelebA-HQ dataset for training T2I generation.

  2. Preprocess the data files in the pkl format.

    cd datasets
    python preprocess_celeba.py

FFHQ can be downloaded from FFHQ, ImageNet can downloaded from Kaggle.

Train FA-VAE

FA-VAE comes with different architectures for Frequency Complement Module (FCM) and different settings for the losses Spectrum Loss (SL) and Dynamic Spectrum Loss (DSL).

  1. FA-VAE on CelebA-HQ with different settings of FCM and SL/DSL can be found in the script train_favae_celeba.sh. These settings are for the Table 2.

    cd favae_scripts
    bash train_favae_celeba.sh
  2. FA-VAE on FFHQ, ImageNetcan be found in the script train_favae_other_datasets.sh

    cd favae_scripts
    bash train_favae_other_datasets.sh

To resume training, the arguments --resume and the path for the argument --resume_path should be provided. For instance, to resume FA-VAE codebook training on ImageNet

torchrun --nnodes=1 --nproc_per_node=2 train_vqgan_ddp.py --ds $OUTPUT --batch_size 2 --print_steps 5000 --img_steps 20000 --codebook_size 16384 --disc_start_epochs 1 --embed_dim 256 --use_lucid_quantizer --use_cosine_sim --with_fcm --ffl_weight 1.0 --use_same_conv_gauss --ffl_weight_features 0.01 --gaussian_kernel 9 --codebook_weight 1.0 --perceptual_weight 1.0 --disc_weight 0.75 --base_lr 2.0e-6 --train_file ../datasets/pkl_files/imagenet_train_wo_cap.pkl --val_file ../datasets/pkl_files/imagenet_test_wo_cap.pkl --resume --resume_path $RESUME_PATH

Train CAT Models

  1. CAT for T2I generation on CelebA
    cd cat_scripts
    bash script_gpt_CA_celeba.sh

BibTeX

@inproceedings{favae2023cvpr,
  title={Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder},
  author={Lin, Xinmiao and Li, Yikang and Hsiao, Jenhao and Ho, Chiuman and Kong, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

License

See the LICENSE file for license rights and limitations (MIT).

Acknowledge

The implementation of FA-VAE relies on resources from Clip-Gen, taming-transformers, CLIP, vector-quantize-pytorch, PerceptualSimilarity, and pytorch-fid. We thank the original authors for their open-sourcing.

About

Description: Frequency Augmented Variational Autoencoder for better Image Reconstruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published