Official repository for the paper "End-to-End Visual Editing with a Generatively Pre-Trained Artist", which will appear at ECCV 2022. Here, we consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end.
We pair this end-to-end approach with the adoption of state-of-the-art auto-regressive image modelling using transformers for learning and sampling from this distribution. In order to train a model end-to-end, we take a self-supervised approach that simulates edits by augmenting off-the-shelf images during training. This approach is shown in the Figure below.
This repository contains the code for:
- A demo using a pre-trained E2EVE model trained on LSUN Bedroom
- Example training scripts for training the E2EVE transformer are included for both:
- Block edits, where the model learns to edit a square region in a source image (see paper for details)
- Random free form edits, where the model learns to edit randomly drawn regions in a source image
- Pre-training the visual tokenizers (i.e. E1,D1 and E2,D2 in the Figure above)
- Clone this repository
pip install -r requirements.txt
- Download required meta data for the demo and training with
./utils/download_val_meta.sh
. This includes metadata for the E2EVE validation setup
We provide a demo for using the E2EVE model trained on LSUN Bedrooms in an IPython Notebook. To run the demo:
- download the (i) meta data (see above), and (ii) LSUN Bedroom weights using
./utils/download_LSUN_weights.sh
- follow the notebook
demo.ipynb
. This will generate the examples shown below:
Example config files for training on FFHQ with block edits or random free form edits are found in configs/FFHQ/train_block_edit.json
and configs/FFHQ/train_random_mask.json
, respectively.
First, make sure to:
-
include paths to pre-trained visual tokenizers
-
download the data required for E2EVE model validation
-
Then, to train with block edits, run
python main.py --parameters configs/FFHQ/train_block_edit.json
- To train with random free form edits, run
python main.py --parameters configs/FFHQ/train_random_mask.json
Training the visual generator requires pre-trained visual tokenizers. First, make sure to include the paths to the pre-trained visual tokenizers in the config files:
- point
model.first_stage_config.ckpt_path
to the pretained source/masked-source image visual tokenizer (i.e. E1,D1) - point
model.cond_stage_config.ckpt_path
to the pretained driver image visual tokenizer (i.e. E2,D2) - point
model.cond_stage2_config.ckpt_path
to the pretained source/masked-source image visual tokenizer (i.e. E1,D1)
Training tokenizers from scratch - for training visual tokenizers from scratch, see this section
Pre-trained tokenizers - note, for fairness in the paper, we use the same visual tokenizer as released by the EdiBERT paper authors. In the paper we demonstrate the advantage of using two seperate visual tokenizers for the driver and source/masked-source images. However, competitive results can still be obtained by using the source image tokenizer for both.
We implement a model validation process where metrics are computed over a set of samples generated by the model (see paper for details). This requires some prepared data so that samples are computed over the same inputs each time, and so that FID is compared against the same real image features. This is included in the meta data download in the Setup section
Example config files for training the source/masked-source image, or driver image visual tokenizers on FFHQ are found in configs/FFHQ/train_VQGAN_source.json
and configs/FFHQ/train_VQGAN_driver.json
, respectively.
- To train the source/masked-source image visual tokenizer, run:
python main.py --parameters configs/FFHQ/train_VQGAN_source.json
- To train the driver image visual tokenizer, run:
python main.py --parameters configs/FFHQ/train_VQGAN_driver.json
If you find this repository useful, please cite:
@InProceedings{Brown22,
author = "Andrew Brown and Cheng-Yang Fu and Omkar Parkhi and Tamara L. Berg and Andrea Vedaldi",
title = "End-to-End Visual Editing with a Generatively Pre-Trained Artist",
booktitle = "European Conference on Computer Vision (ECCV), 2022.",
year = "2022",
}
- code for running inference on given inputs
- code for constructing the meta-data required for running E2EVE validation on a new custom dataset