Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo code of achieving expression transfer of real images. #6

Open
zgxiangyang opened this issue Jun 30, 2020 · 5 comments
Open

Demo code of achieving expression transfer of real images. #6

zgxiangyang opened this issue Jun 30, 2020 · 5 comments

Comments

@zgxiangyang
Copy link

Could you please provide the demo code of transferring expression of a real image to a generated image.

@zgxiangyang
Copy link
Author

The extracted coef's shape is (257,0), while the output shape of z_to_lambda_mapping is (254)。What is the relationship between them?

@YuDeng
Copy link
Contributor

YuDeng commented Jul 1, 2020

In face image generation we eliminate the 3D world translation (assume it to be zero) for a face. So our input to the generator consists only of identity, expression, pose, and lighting in a total of 254 dimensions, while the coefficients extracted from the 3D reconstruction network also contain translation (3 dimensions) so it is in the shape of 257.

@haibo-qiu
Copy link

haibo-qiu commented Aug 4, 2020

Hi YuDeng

Impressive work!

I am trying to achieve non-identity factors (including expression, illumination, and pose) transfer of real images, but I get unexpected results that do not preserve the identity information. Specifically, my target is to manipulate source image with the expression, illumination, and pose of the reference image while keeping identity unchanged. Here are my source and reference images.
image

My process is kind of simple and naive.

  1. utilizing the R_Net to extract the coefficient (257) of both source and reference image, and discarding the last three elements (254).
  2. Combining the identity coefficient of the source image with the other three factors coefficients of the reference image to generate a new coefficient for later face generation.
  3. Adding random noise to the above coefficient and utilizing the truncate_generation to obtain the manipulated results.

Results with different noises
image

It looks like the generated images catch the expression, illumination, and pose of the reference image, but they do not preserve the identity information of the source image.
Does the gap between the coefficient from R_Net and the coefficient learned by your network lead to this problem?

Or I have to do manipulations in W+ space as you mentioned in your paper, but how?
I have noticed your answer here (#9 (comment)). Does it mean that if I want to manipulate one image, I need to utilize backprop to obtain its latent code first, and then vary this code to achieve manipulation? If so, do you have any convenient method to accelerate this process?

@YuDeng
Copy link
Contributor

YuDeng commented Apr 12, 2021

@haibo-qiu. Hi, sorry that I have not noticed this issue until now. Hope that my answer is not too late.

As you mentioned, there is a gap between the extracted identity coefficients from R_Net and the identity coefficients received by the generator during training. The generator is trained with coefficients sampled from a VAE which does not faithfully capture the original distribution of real identity coefficients. As a result, if you extract identity of a real image and give it to the generator, most likely the generator will produce an image with different identity.

To alleviate this problem, a better way is to embed a real image into the W+ space and modify it in that space, which is just what we did in our paper. However, the optimization process will take several minutes for each image which is quite slow. Besides, embedding a real image into W+ space is risky in that W+ space is not as good as W space and Z space which guarantee disentanglement between different semantic factors.

Currently there are some method trying to embed real images into GAN's latent space that can faithfully reconstruct the input meanwhile provide reasonable semantic editing, for example In-Domain GAN Inversion for Real Image Editing. I think these papers might help in your case.

@haibo-qiu
Copy link

Hi @YuDeng,

Thanks for your kind reply : )

When I realized the gap in identity coefficient and the risk of W+ space, I changed my thought to explore using synthetically generated face images for face recognition.

With your DiscoFaceGAN model, I can control several factors like expression, illumination, and pose when generating face images, then study their impacts on recognition performance. Besides, I also proposed the identity mixup which operates on the identity coefficient level to alleviate the gap between models trained with natural and synthetic images. Our work is named SynFace, which was accepted by ICCV2021.

Many thanks for your work, which really inspired me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants