Frequency Augmented VAE (FA-VAE)

This is the original implementation for the paper "Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder" published in CVPR 2023.

FA-VAE is a model that reconstructs images through improving alignment on the frequency spectrums between the original and reconstructed images.

To-Do

~~We will be releasing the checkpoints shortly.~~

Requirements

The packages needed are in environment.yaml for reference.

Checkpoints

Model	Link
FA-VAE on CelebA-HQ (Table 2 row 8, FCM (Res) + non pair-wise DSL)	expe_5.pt
FA-VAE on FFHQ (Table 1 row 3)	favae-ffhq.pt
FA-VAE on ImageNet (f=16) (Table 1 last row)	favae-imagenet-f16.pt
FA-VAE on ImageNet (f=4) (Table 1 row 6)	favae-imagenet-f4.pt
CAT on CelebA-HQ	cat_celeba.pt

Data Preparation

CelebA-HQ

Download the dataset:
- CelebA-HQ dataset can be downloaded from CelebA-Mask-HQ.
- The train test split is in the file list_eval_partition.txt, in CelebA where "0" is train, "1" is eval, and "2" is test.
- Download the captions from MM-CelebA-HQ dataset for training T2I generation.
Preprocess the data files in the pkl format.
```
cd datasets
python preprocess_celeba.py
```

FFHQ can be downloaded from FFHQ, ImageNet can downloaded from Kaggle.

Train FA-VAE

FA-VAE comes with different architectures for Frequency Complement Module (FCM) and different settings for the losses Spectrum Loss (SL) and Dynamic Spectrum Loss (DSL).

FA-VAE on CelebA-HQ with different settings of FCM and SL/DSL can be found in the script train_favae_celeba.sh. These settings are for the Table 2.
```
cd favae_scripts
bash train_favae_celeba.sh
```
FA-VAE on FFHQ, ImageNetcan be found in the script train_favae_other_datasets.sh
```
cd favae_scripts
bash train_favae_other_datasets.sh
```

To resume training, the arguments --resume and the path for the argument --resume_path should be provided. For instance, to resume FA-VAE codebook training on ImageNet

torchrun --nnodes=1 --nproc_per_node=2 train_vqgan_ddp.py --ds $OUTPUT --batch_size 2 --print_steps 5000 --img_steps 20000 --codebook_size 16384 --disc_start_epochs 1 --embed_dim 256 --use_lucid_quantizer --use_cosine_sim --with_fcm --ffl_weight 1.0 --use_same_conv_gauss --ffl_weight_features 0.01 --gaussian_kernel 9 --codebook_weight 1.0 --perceptual_weight 1.0 --disc_weight 0.75 --base_lr 2.0e-6 --train_file ../datasets/pkl_files/imagenet_train_wo_cap.pkl --val_file ../datasets/pkl_files/imagenet_test_wo_cap.pkl --resume --resume_path $RESUME_PATH

Train CAT Models

CAT for T2I generation on CelebA

cd cat_scripts
bash script_gpt_CA_celeba.sh

BibTeX

@inproceedings{favae2023cvpr,
  title={Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder},
  author={Lin, Xinmiao and Li, Yikang and Hsiao, Jenhao and Ho, Chiuman and Kong, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

License

See the LICENSE file for license rights and limitations (MIT).

Acknowledge

The implementation of FA-VAE relies on resources from Clip-Gen, taming-transformers, CLIP, vector-quantize-pytorch, PerceptualSimilarity, and pytorch-fid. We thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CLIP		CLIP
assets		assets
cat_scripts		cat_scripts
datasets		datasets
favae_scripts		favae_scripts
losses		losses
models		models
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
environment.yaml		environment.yaml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frequency Augmented VAE (FA-VAE)

To-Do

Requirements

Checkpoints

Data Preparation

CelebA-HQ

Train FA-VAE

Train CAT Models

BibTeX

License

Acknowledge

About

Releases

Packages

Languages

License

oppo-us-research/FA-VAE

Folders and files

Latest commit

History

Repository files navigation

Frequency Augmented VAE (FA-VAE)

To-Do

Requirements

Checkpoints

Data Preparation

CelebA-HQ

Train FA-VAE

Train CAT Models

BibTeX

License

Acknowledge

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages