Yutong ZHOU in Interaction Laboratory, Ritsumeikan University. ლ(╹◡╹ლ)
-
In the last few decades, the fields of Computer Vision (CV) and Natural Language Processing (NLP) have been made several major technological breakthroughs in deep learning research. Recently, researchers appear interested in combining semantic information and visual information in these traditionally independent fields. A number of studies have been conducted on the text-to-image synthesis techniques that transfer input textual description (keywords or sentences) into realistic images.
-
A-Text_to_Image-zoo: This is a survey on Text-to-Image Generation/Synthesis.
-
Papers, codes and datasets for the text-to-image task are available here.
-
Inception Score (IS) [Paper] [Python Code (Pytorch)] [Python Code (Tensorflow)]
-
Fréchet Inception Distance (FID) [Paper] [Python Code (Pytorch)] [Python Code (Tensorflow)]
-
R-precision [Paper]
-
L2 error [Paper]
-
Caltech-UCSD Bird(CUB)
Caltech-UCSD Birds-200-2011 (CUB-200-2011) is an extended version of the CUB-200 dataset, with roughly double the number of images per class and new part location annotations.
- Detailed information (Images): ⇒ [Paper] [Website]
- Number of different categories: 200 (Training: 150 categories. Testing: 50 categories.)
- Number of bird images: 11,788
- Annotations per image: 15 Part Locations, 312 Binary Attributes, 1 Bounding Box, Ground-truth Segmentation
- Detailed information (Text Descriptions): ⇒ [Paper] [Website]
- Descriptions per image: 10 Captions
- Detailed information (Images): ⇒ [Paper] [Website]
-
Oxford-102 Flower
Oxford-102 Flower is a 102 category dataset, consisting of 102 flower categories. The flowers are chosen to be flower commonly occurring in the United Kingdom. The images have large scale, pose and light variations.
-
MS-COCO
COCO is a large-scale object detection, segmentation, and captioning dataset.
-
2020
- (CVPR 2020) RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge, Jun Cheng et al. [Paper]
- (CVPR 2020) ManiGAN: Text-Guided Image Manipulation, Bowen Li et al. [Paper] [Code]
- (CVPR 2020) CookGAN: Causality based Text-to-Image Synthesis, Bin Zhu et al. [Paper]
- (CVPR 2020) SegAttnGAN: Text to Image Generation with Segmentation Attention, Yuchuan Gou et al. [Paper]
- (TPAMI 2020) Semantic Object Accuracy for Generative Text-to-Image Synthesis, Tobias Hinz et al. [Paper] [Code]
- (ACM Trans 2020) End-to-End Text-to-Image Synthesis with Spatial Constrains, Min Wang et al. [Paper]
-
2019
- (AAAI 2019) Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis, Minfeng Zhu et al. [Web]
- (CVPR 2019) DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis, Minfeng Zhu et al. [Paper] [Code]
- (CVPR 2019) Object-driven Text-to-Image Synthesis via Adversarial Training, Wenbo Li et al. [Paper] [Code]
- (CVPR 2019) MirrorGAN: Learning Text-to-image Generation by Redescription, Tingting Qiao et al. [Paper] [Code]
- (CVPR 2019) Text2Scene: Generating Abstract Scenes from Textual Descriptions, Fuwen Tan et al. [Paper] [Code]
- (CVPR 2019) Semantics Disentangling for Text-to-Image Generation, Guojun Yin et al. [Paper] [Code]
- (ICCV 2019) Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis, Hongchen Tan et al. [Paper]
- (NIPS 2019) Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge, Tingting Qiao et al. [Paper] [Code]
-
2018
- (TPAMI 2018) StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks, Han Zhang et al. [Paper] [Code]
- (BMVC 2018) MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis, Hyojin Park et al. [Paper] [Code]
- (CVPR 2018) AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks, Tao Xu et al. [Paper] [Code]
- (CVPR 2018) Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network, Zizhao Zhang et al. [Paper] [Code]
- (CVPR 2018) Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis, Seunghoon Hong et al. [Paper]
- (CVPR 2018) Image Generation from Scene Graphs, Justin Johnson et al. [Paper] [Code]
- (NIPS 2018) Text-adaptive generative adversarial networks: Manipulating images with natural language, Seonghyeon Nam et al. [Paper] [Code]
-
2017
-
2016
- If you have any question, please feel free to concat Yutong ZHOU (E-mail: [email protected]).