Partial and Gated Convolution

  • partial convolution [1]: hard-gating single-channel unlearnable layer

  • gated convolution [2]: soft-gating multi-channel learnable layer

Filling Priority

filling priority [3]: Priority is the product of confidence term (a measure of the amount of reliable information surrounding the pixel) and data term (a function of the strength of isophotes hitting the front). Select the patch to be filled based on the priority, similar to patch-based texture synthesis.

<img src="http://bcmi.sjtu.edu.cn/~niuli/github_images/bO5YXEQ.jpg" width="40%"> 

Diverse image inpainting

  • random vector: use random vector to generate diverse and plausible outputs [6]

  • attribute vector: use target attribute values to guide image inpainting [7]

  • use autoregressive model: [11] [12]

Auxiliary Information

  • Semantics

    • enforce inpainted result to have expected semantics [8]
    • first inpaint semantic map and then use complete semantic map as guidance [9]
    • guide feature learning in the decoder [10]
    • semantic-aware attention [13]
  • Edges

    • Inpaint edge map and use complete edge map to help image inpainting [4] [5]

Frequency Domain

  • using frequency map as network input [14]
  • fourier convolution: LAMA[15])
  • wavelet [16]

Bridging Inpainting and Generation

Transformer

[12] [18] [19]

Diffusion Model

[20] [21] [22] [23]

References

  1. Liu, Guilin, et al. “Image inpainting for irregular holes using partial convolutions.” ECCV, 2018.
  2. Yu, Jiahui, et al. “Free-form image inpainting with gated convolution.” ICCV, 2019.
  3. Criminisi, Antonio, Patrick Pérez, and Kentaro Toyama. “Region filling and object removal by exemplar-based image inpainting.” TIP, 2004.
  4. Nazeri, Kamyar, et al. “Edgeconnect: Generative image inpainting with adversarial edge learning.” arXiv preprint arXiv:1901.00212 (2019).
  5. Xiong, Wei, et al. “Foreground-aware image inpainting.” CVPR, 2019.
  6. Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.
  7. Chen, Zeyuan, et al. “High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks.” arXiv preprint arXiv:1801.07632 (2018).
  8. Li, Yijun, et al. “Generative face completion.” CVPR, 2017.
  9. Song, Yuhang, et al. “Spg-net: Segmentation prediction and guidance network for image inpainting.” arXiv preprint arXiv:1805.03356 (2018).
  10. Liao, Liang, et al. “Guidance and evaluation: Semantic-aware image inpainting for mixed scenes.” arXiv preprint arXiv:2003.06877 (2020).
  11. Peng, Jialun, et al. “Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE.” CVPR, 2021.
  12. Wan, Ziyu, et al. “High-Fidelity Pluralistic Image Completion with Transformers.” arXiv preprint arXiv:2103.14031 (2021).
  13. Liao, Liang, et al. “Image inpainting guided by coherence priors of semantics and textures.” CVPR, 2021.
  14. Roy, Hiya, et al. “Image inpainting using frequency domain priors.” arXiv preprint arXiv:2012.01832 (2020).
  15. Suvorov, Roman, et al. “Resolution-robust Large Mask Inpainting with Fourier Convolutions.” WACV (2021).
  16. Yu, Yingchen, et al. “WaveFill: A Wavelet-based Generation Network for Image Inpainting.” ICCV, 2021.
  17. Zhao, Shengyu, et al. “Large scale image completion via co-modulated generative adversarial networks.” ICLR (2021).
  18. Zheng, Chuanxia, et al. “Bridging global context interactions for high-fidelity image completion.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
  19. Li, Wenbo, et al. “Mat: Mask-aware transformer for large hole image inpainting.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
  20. Lugmayr, Andreas, et al. “Repaint: Inpainting using denoising diffusion probabilistic models.” CVPR, 2022.
  21. Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” CVPR, 2022.
  22. Li, Wenbo, et al. “SDM: Spatial Diffusion Model for Large Hole Image Inpainting.” arXiv preprint arXiv:2212.02963 (2022).
  23. Wang, Su, et al. “Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting.” arXiv preprint arXiv:2212.06909 (2022).

Fundamental

Image Statistics: illuminance, color temperature, saturation, local contrast, hue, texture, tone

Color spaces: RGB color space, CIELab color space (saturation/chrominance, hue, luminance).

Image realism

  1. Predict the realism using the discriminator learnt based on real images and fake images [a]

  2. Predict the realism based on global and local statistics: distance to neighboring realistic image, similarity between foreground and background [a]

Image harmonization

After pasting the foreground on the background, harmonize the foreground.

  • Traditional methods: match the foreground with the background; match the foreground with other semantically or statistically close realistic images.

One interesting problem in image harmonization is whether the decomposition of reflectance and illumination is unique. If we have strong prior knowledge for the object reflectance (e.g., black-and-white zebra), the decomposition may be unique. Or if the object color is complex enough, which is equivalent to adding enough constraints, the decomposition may be unique. Otherwise, if we do not have strong prior knowledge for the object reflectance (e.g., a vase of arbitrary color) and the object color is simple (e.g., a single color), the decomposition is not unique.

Given a source image and an obtained target image after applying color transfer, we hope to know whether there exists a valid path between source image and target image and whether there exist multiple valid paths between them.

Deep painterly harmonization

  • deep painterly harmonization [1]
  • style harmonization [2]
  • image blending [3]

Reference

[1] Luan, Fujun, et al. “Deep painterly harmonization.” Computer graphics forum. Vol. 37. No. 4. 2018.

[2] Peng, Hwai-Jin, Chia-Ming Wang, and Yu-Chiang Frank Wang. “Element-Embedded Style Transfer Networks for Style Harmonization.” BMVC. 2019.

[3] Zhang, Lingzhi, Tarmily Wen, and Jianbo Shi. “Deep image blending.” WACV. 2020.

Simply speaking, image composition means cut-and-paste, that is, cutting one piece from one image and paste it on another image. The obtained composite image may be unrealistic due to the following reasons:

  • The foreground is not well segmented, so there is an evident and unnatural boundary between foreground and background.
  • The foreground and background may look incompatible due to different color and illumination statistics. For example, the foreground is captured in the daytime while the background is captured at night.
  • The foreground is placed at an unreasonable location. For example, a horse is placed in the sky.
  • The foreground needs to be geometrically transformed. For example, when pasting eye glasses on a face, the eye glasses should fit the eyes and ears on the face.
  • The pasted foreground may also affect the background. For example, the foreground may cast a shadow on the background.

Therefore, image composition is actually a combination of multiple subtasks.
Previously, some works only focus on one subtask such as harmonization or geometric transformation [1]. Some other works attempt to solve all subtasks in a single package [2] [3] [4] [5] [6].

Human matting+composition: [7]

Reference

[1] Lin, Chen-Hsuan, et al. “St-gan: Spatial transformer generative adversarial networks for image compositing.”, CVPR, 2018.

[2] Tan, Fuwen, et al. “Where and who? automatic semantic-aware person composition.” WACV, 2018.

[3] Chen, Bor-Chun, and Andrew Kae. “Toward Realistic Image Compositing with Adversarial Learning.” CVPR, 2019.

[4] Lingzhi Zhang, Tarmily Wen, Jianbo Shi: Deep Image Blending. WACV 2020: 231-240

[5] Weng, Shuchen, et al. “MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis.” CVPR, 2020.

[6] Zhan, Fangneng, et al. “Adversarial Image Composition with Auxiliary Illumination.” arXiv preprint arXiv:2009.08255 (2020).

[7] Zhang, He, et al. “Deep Image Compositing.” arXiv preprint arXiv:2011.02146 (2020).

The target is to cut the foreground from one image and paste it on another image, followed by adjusting the foreground. The prevalent technique Poisson blending [1] [2], also called seamless cloning, is matching the gradient with boundary conditions via solving Poisson equation. In image harmonization, the original image containing the foreground may be unavailable.

  • stacked generators from low-resolution to high-resolution: [4] [5] [6] [10]

  • low-resolution generator embedded in high-resolution generator, upsample low-resolution result and add residual: [1] [7] [8] [9] [12]

  • fuse low-resolution outputs: [3] [11]

  • shallow mapping from large-scale input to large-scale output: [2](look-up table) [15] [16]

  • joint upsampling: given high-resolution input and low-resolution output, get high-resolution output. 1) append high-resolution input [1] or the feature of high-resolution input [10] to refinement network. 2) guided filter [13], use high-resolution input as guidance and coarse high-resolution output as filter input. 3) attentional upsampling [14]

Reference

[1] Wang, Ting-Chun, et al. “High-resolution image synthesis and semantic manipulation with conditional gans.” CVPR, 2018.

[2] Zeng, Hui, et al. “Learning Image-adaptive 3D Lookup Tables for High Performance Photo Enhancement in Real-time.” PAMI, 2020.

[3] Yu, Haichao, et al. “High-Resolution Deep Image Matting.” arXiv preprint arXiv:2009.06613 (2020).

[4] Denton, Emily L., Soumith Chintala, and Rob Fergus. “Deep generative image models using a laplacian pyramid of adversarial networks.” NIPS, 2015.

[5] Huang, Xun, et al. “Stacked generative adversarial networks.” CVPR, 2017.

[6] Zhang, Han, et al. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks.” ICCV, 2017.

[7] Andreini, Paolo, et al. “A two stage gan for high resolution retinal image generation and segmentation.” arXiv preprint arXiv:1907.12296 (2019).

[8] Hamada, K., Tachibana, K., Li, T., Honda, H., & Uchida, Y. (2018). Full-body high-resolution anime generation with progressive structure-conditional generative adversarial networks. ECCV, 2018.

[9] Karras, Tero, et al. “Progressive growing of gans for improved quality, stability, and variation.” arXiv preprint arXiv:1710.10196 (2017).

[10] Chen, Qifeng, and Vladlen Koltun. “Photographic image synthesis with cascaded refinement networks.” ICCV, 2017.

[11] Anokhin, Ivan, et al. “High-Resolution Daytime Translation Without Domain Labels.” CVPR, 2020.

[12] Yi, Zili, et al. “Contextual residual aggregation for ultra high-resolution image inpainting.” CVPR, 2020.

[13] Wu, Huikai, et al. “Fast end-to-end trainable guided filter.” CVPR, 2018.

[14] Kundu, Souvik, et al. “Attention-based Image Upsampling.” arXiv preprint arXiv:2012.09904 (2020).

[15] Cong, Wenyan, et al. “High-Resolution Image Harmonization via Collaborative Dual Transformations.” CVPR, 2022.

[16] Liang, Jingtang, Xiaodong Cun, and Chi-Man Pun. “Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization.” ECCV, 2022.

  • Only geometry: [1], [2]

  • geometry+appearance [3]

  • geometry+occlusion+appearance: [4] [5]

Reference

  1. Lin, Chen-Hsuan, et al. “St-gan: Spatial transformer generative adversarial networks for image compositing.” CVPR, 2018.

  2. Kikuchi, Kotaro, et al. “Regularized Adversarial Training for Single-shot Virtual Try-On.” ICCV Workshops. 2019.

  3. Zhan, Fangneng, Hongyuan Zhu, and Shijian Lu. “Spatial fusion gan for image synthesis.” CVPR, 2019.

  4. Azadi, Samaneh, et al. “Compositional gan: Learning image-conditional binary composition.” International Journal of Computer Vision 128.10 (2020): 2570-2585.

  5. Fangneng Zhan, Jiaxing Huang, Shijian Lu, “Hierarchy Composition GAN for High-fidelity
    Image Synthesis.” Transactions on cybernetics, 2021.

Tutorial of generative models:

References

[1] Dhariwal, Prafulla, and Alex Nichol. “Diffusion models beat gans on image synthesis.” arXiv preprint arXiv:2105.05233 (2021).

[2] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

[3] Elucidating the Design Space of Diffusion-Based Generative Models

[4] Wang, Tengfei, et al. “Pretraining is All You Need for Image-to-Image Translation.” arXiv preprint arXiv:2205.12952 (2022).

Training tricks:

17 tricks for training GAN: https://github.com/soumith/ganhacks

  • soft label: replace 1 with 0.9 and 0 with 0.3

  • train discriminator more times (e.g., 2X) than generator

  • use labels: auxiliary tasks

  • normalize inputs to [-1, 1]

  • use tanh before output

  • use batchnorm (not for the first and last layer)

  • use spherical distribution instead of uniform distribution

  • leaky relu

  • stability tricks from RL

Tricks from the BigGAN [1]

  • class-conditional BatchNorm

  • Spectral normalization

  • orthogonal initialization

  • truncated prior (truncation trick to seek the trade-off between fidelity and variety)

  • enforce orthogonality on weights to improve the model smoothness

More tricks

  • gradient penalty [8]

  • unrolling [9] and packing [10]

Famous GANs:

  • LSGAN: replace cross-entropy loss with least square loss

  • Wasserstein GAN: replace discriminator with a critic function

  • LAPGAN: coarse-to-fine using laplacian pyramid

  • seqGAN: generate discrete sequences

  • E-GAN [2]: place GAN under the framework of genetic evolution

  • Dissection GAN [3]: use intervention for causality

  • CoGAN [4]: two generators and discriminators softly share parameters

  • DCGAN [5]

  • Progressive GAN [6]

  • Style-based GAN [7]

  • stack GAN [17]

  • self-attention GAN [18]

  • BigGAN [20]

  • LoGAN [19]

  • Conditioned on label vector: conditional GAN [14], CVAE-GAN [16]

  • Conditioned on a single image: pix2pix [11]; high-resolution pix2pix [12] (add coarse-to-fine strategy); BicycleGAN [13] (combination of cVAE-GAN and cLR-GAN); DAGAN [15]

  • StyleGAN-XL [23]

  • StyleGAN-T [22]

  • GigaGAN [21]

Measurement:

Results: Besides qualitative results, there are some quantitative metric like Inception score and Frechet Inception Distance.

Stability: for the stability of generator and discriminator, refer to [1].

Tutorial and Survey:

References

[1] Brock A, Donahue J, Simonyan K. Large scale gan training for high fidelity natural image synthesis[J]. arXiv preprint arXiv:1809.11096, 2018.

[2] Wang C, Xu C, Yao X, et al. Evolutionary Generative Adversarial Networks[J]. arXiv preprint arXiv:1803.00657, 2018.

[3] Bau D, Zhu J Y, Strobelt H, et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks[J]. arXiv preprint arXiv:1811.10597, 2018.

[4] Liu M Y, Tuzel O. Coupled generative adversarial networks[C]//Advances in neural information processing systems. 2016: 469-477.

[5] Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).

[6] Karras, Tero, et al. “Progressive growing of gans for improved quality, stability, and variation.” arXiv preprint arXiv:1710.10196 (2017).

[7] Karras, Tero, Samuli Laine, and Timo Aila. “A Style-Based Generator Architecture for Generative Adversarial Networks.” arXiv preprint arXiv:1812.04948 (2018).

[8] Gulrajani, Ishaan, et al. “Improved training of wasserstein gans.” Advances in Neural Information Processing Systems. 2017.

[9] Metz, Luke, et al. “Unrolled generative adversarial networks.” arXiv preprint arXiv:1611.02163 (2016).

[10] Lin, Zinan, et al. “PacGAN: The power of two samples in generative adversarial networks.” Advances in Neural Information Processing Systems. 2018.

[11] Isola, Phillip, et al. “Image-to-image translation with conditional adversarial networks.” CVPR, 2017

[12] Wang, Ting-Chun, et al. “High-resolution image synthesis and semantic manipulation with conditional gans.” CVPR, 2018.

[13] Zhu, Jun-Yan, et al. “Toward multimodal image-to-image translation.” NIPS, 2017.

[14] Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).

[15] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).

[16] Bao, Jianmin, et al. “CVAE-GAN: fine-grained image generation through asymmetric training.” ICCV, 2017.

[17] Han Zhang, Tao Xu, Hongsheng Li, “StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks”, ICCV 2017

[18] Han Zhang, Ian J. Goodfellow, Dimitris N. Metaxas, Augustus Odena, “Self-Attention Generative Adversarial Networks”. CoRR abs/1805.08318 (2018)

[19] Wu, Yan, et al. “LOGAN: Latent Optimisation for Generative Adversarial Networks.” arXiv preprint arXiv:1912.00953 (2019).

[20] Brock, Andrew, Jeff Donahue, and Karen Simonyan. “Large scale gan training for high fidelity natural image synthesis.” arXiv preprint arXiv:1809.11096 (2018).

[21] Kang, Minguk, et al. “Scaling up GANs for Text-to-Image Synthesis.” arXiv preprint arXiv:2303.05511 (2023).

[22] Sauer, Axel, et al. “Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis.” arXiv preprint arXiv:2301.09515 (2023).

[24] Sauer, Axel, Katja Schwarz, and Andreas Geiger. “Stylegan-xl: Scaling stylegan to large diverse datasets.” ACM SIGGRAPH 2022 conference proceedings. 2022.

  1. Transfer a large proportion of parameters and only update a few parameters [1] updates scaling and shifting parameters. [2] updates the miner before generator. [3] uses Fisher information to select the parameters to be updates. Empirically, the last layers are prone to be frozen. Similarly, in [6], the last layers are frozen and scaling/shifting parameters are predicted.

  2. Transfer structure similarity from large dataset to small dataset: [4]

  3. Transfer parameter basis: [5] adapts the singular values of the pre-trained weights while freezing the corresponding singular vectors.

Reference

  1. Atsuhiro Noguchi, Tatsuya Harada: “Image generation from small datasets via batch statistics adaptation.” ICCV (2019)

  2. Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, Joost van de Weijer: “MineGAN: effective knowledge transfer from GANs to target domains with few images.” CVPR (2020).

  3. Yijun Li, Richard Zhang, Jingwan Lu, Eli Shechtman: “Few-shot Image Generation with Elastic Weight Consolidation.” NeurIPS (2020).

  4. Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang: “Few-shot Image Generation via Cross-domain Correspondence.” CVPR (2021).

  5. Esther Robb, Wen-Sheng Chu, Abhishek Kumar, Jia-Bin Huang: “Few-Shot Adaptation of Generative Adversarial Networks.” arXiv (2020).

  6. Miaoyun Zhao, Yulai Cong, Lawrence Carin: “On Leveraging Pretrained GANs for Generation with Limited Data.” ICML (2020).

  1. Fusion-based method: Generative Matching Network (GMN) [1] (VAE with matching network for generator and recognizer). MatchingGAN [3] learns reasonable interpolation coefficients. F2GAN [5] first fuses high-level features and then fills in low-level details.

  2. Optimization-based method: FIGR [2] is based on Reptile. DAWSON [4] is based on MAML.

  3. Transformation-based method: DAGAN [6] samples random vectors to generate new images. DeltaGAN [7] learns sample-specific delta.

Reference

[1] Bartunov, Sergey, and Dmitry Vetrov. “Few-shot generative modelling with generative matching networks.” , 2018.

[2] Clouâtre, Louis, and Marc Demers. “FIGR: Few-shot ImASTATISage Generation with Reptile.” arXiv preprint arXiv:1901.02199 (2019).

[3] Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang, “MatchingGAN: Matching-based Few-shot Image Generation”, ICME, 2020

[4] Weixin Liang, Zixuan Liu, Can Liu: “DAWSON: A Domain Adaptive Few Shot Generation Framework.” CoRR abs/2001.00576 (2020)

[5] Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, Liqing Zhang: “F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation.” ACM MM (2020)

[6] Antreas Antoniou, Amos J. Storkey, Harrison Edwards: “Data Augmentation Generative Adversarial Networks.” stat (2018)

[7] Yan Hong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang: “DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta.” CoRR abs/2009.08753 (2020)

0%