- Gradient harmonization: [1]
[1] Gradient Harmonized Single-stage Detector, AAAI, 2019
[1] Gradient Harmonized Single-stage Detector, AAAI, 2019
Geometry feature generation based on unsupervisely detected landmarks. [1]
Disentangle bottleneck features into category-invariant features and category-specific features. Category-invariant features encode the pose information.
Lin, Chen-Hsuan, et al. “St-gan: Spatial transformer generative adversarial networks for image compositing.” CVPR, 2018.
Kikuchi, Kotaro, et al. “Regularized Adversarial Training for Single-shot Virtual Try-On.” ICCV Workshops. 2019.
Zhan, Fangneng, Hongyuan Zhu, and Shijian Lu. “Spatial fusion gan for image synthesis.” CVPR, 2019.
Azadi, Samaneh, et al. “Compositional gan: Learning image-conditional binary composition.” International Journal of Computer Vision 128.10 (2020): 2570-2585.
Fangneng Zhan, Jiaxing Huang, Shijian Lu, “Hierarchy Composition GAN for High-fidelity
Image Synthesis.” Transactions on cybernetics, 2021.
Tutorial of generative models:
[1] Dhariwal, Prafulla, and Alex Nichol. “Diffusion models beat gans on image synthesis.” arXiv preprint arXiv:2105.05233 (2021).
[2] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
[3] Elucidating the Design Space of Diffusion-Based Generative Models
[4] Wang, Tengfei, et al. “Pretraining is All You Need for Image-to-Image Translation.” arXiv preprint arXiv:2205.12952 (2022).
Corneal reflection-based methods
Appearance based methods
Calibration: obtain the visual axis and kappa angle for each person.
Facial landmarks detection
Head Pose Estimation
[MPIIGaze]: fine-grained annotation
[Eyediap]: RGB-D
17 tricks for training GAN: https://github.com/soumith/ganhacks
soft label: replace 1 with 0.9 and 0 with 0.3
train discriminator more times (e.g., 2X) than generator
use labels: auxiliary tasks
normalize inputs to [-1, 1]
use tanh before output
use batchnorm (not for the first and last layer)
use spherical distribution instead of uniform distribution
leaky relu
stability tricks from RL
Tricks from the BigGAN [1]
class-conditional BatchNorm
Spectral normalization
orthogonal initialization
truncated prior (truncation trick to seek the trade-off between fidelity and variety)
enforce orthogonality on weights to improve the model smoothness
More tricks
LSGAN: replace cross-entropy loss with least square loss
Wasserstein GAN: replace discriminator with a critic function
LAPGAN: coarse-to-fine using laplacian pyramid
seqGAN: generate discrete sequences
E-GAN [2]: place GAN under the framework of genetic evolution
Dissection GAN [3]: use intervention for causality
CoGAN [4]: two generators and discriminators softly share parameters
DCGAN [5]
Progressive GAN [6]
Style-based GAN [7]
stack GAN [17]
self-attention GAN [18]
BigGAN [20]
LoGAN [19]
Conditioned on label vector: conditional GAN [14], CVAE-GAN [16]
Conditioned on a single image: pix2pix [11]; high-resolution pix2pix [12] (add coarse-to-fine strategy); BicycleGAN [13] (combination of cVAE-GAN and cLR-GAN); DAGAN [15]
StyleGAN-XL [23]
StyleGAN-T [22]
GigaGAN [21]
Results: Besides qualitative results, there are some quantitative metric like Inception score and Frechet Inception Distance.
Stability: for the stability of generator and discriminator, refer to [1].
The GAN zoo: https://github.com/hindupuravinash/the-gan-zoo
A good tutorial: https://github.com/mingyuliutw/cvpr2017\_gan\_tutorial/blob/master/gan_tutorial.pdf
Regularization Methods for Generative Adversarial Networks: An Overview of Recent Studies
Generative adversarial networks in computer vision: A survey and taxonomy [code]
[1] Brock A, Donahue J, Simonyan K. Large scale gan training for high fidelity natural image synthesis[J]. arXiv preprint arXiv:1809.11096, 2018.
[2] Wang C, Xu C, Yao X, et al. Evolutionary Generative Adversarial Networks[J]. arXiv preprint arXiv:1803.00657, 2018.
[3] Bau D, Zhu J Y, Strobelt H, et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks[J]. arXiv preprint arXiv:1811.10597, 2018.
[4] Liu M Y, Tuzel O. Coupled generative adversarial networks[C]//Advances in neural information processing systems. 2016: 469-477.
[5] Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).
[6] Karras, Tero, et al. “Progressive growing of gans for improved quality, stability, and variation.” arXiv preprint arXiv:1710.10196 (2017).
[7] Karras, Tero, Samuli Laine, and Timo Aila. “A Style-Based Generator Architecture for Generative Adversarial Networks.” arXiv preprint arXiv:1812.04948 (2018).
[8] Gulrajani, Ishaan, et al. “Improved training of wasserstein gans.” Advances in Neural Information Processing Systems. 2017.
[9] Metz, Luke, et al. “Unrolled generative adversarial networks.” arXiv preprint arXiv:1611.02163 (2016).
[10] Lin, Zinan, et al. “PacGAN: The power of two samples in generative adversarial networks.” Advances in Neural Information Processing Systems. 2018.
[11] Isola, Phillip, et al. “Image-to-image translation with conditional adversarial networks.” CVPR, 2017
[12] Wang, Ting-Chun, et al. “High-resolution image synthesis and semantic manipulation with conditional gans.” CVPR, 2018.
[13] Zhu, Jun-Yan, et al. “Toward multimodal image-to-image translation.” NIPS, 2017.
[14] Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).
[15] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).
[16] Bao, Jianmin, et al. “CVAE-GAN: fine-grained image generation through asymmetric training.” ICCV, 2017.
[17] Han Zhang, Tao Xu, Hongsheng Li, “StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks”, ICCV 2017
[18] Han Zhang, Ian J. Goodfellow, Dimitris N. Metaxas, Augustus Odena, “Self-Attention Generative Adversarial Networks”. CoRR abs/1805.08318 (2018)
[19] Wu, Yan, et al. “LOGAN: Latent Optimisation for Generative Adversarial Networks.” arXiv preprint arXiv:1912.00953 (2019).
[20] Brock, Andrew, Jeff Donahue, and Karen Simonyan. “Large scale gan training for high fidelity natural image synthesis.” arXiv preprint arXiv:1809.11096 (2018).
[21] Kang, Minguk, et al. “Scaling up GANs for Text-to-Image Synthesis.” arXiv preprint arXiv:2303.05511 (2023).
[22] Sauer, Axel, et al. “Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis.” arXiv preprint arXiv:2301.09515 (2023).
[24] Sauer, Axel, Katja Schwarz, and Andreas Geiger. “Stylegan-xl: Scaling stylegan to large diverse datasets.” ACM SIGGRAPH 2022 conference proceedings. 2022.
From layer i to layer i+1, assume the parameters on layer i are $s_i$ (stride), $p_i$ (patch), $k_i$ (kernel filter size), the width or height of layer i are $r_i$. Then, based on common sense,
In the reverse process, $r_i = s_i r_{i+1}-s_i-2p_i+k_i$ or $r_i = s_i r_{i+1}-s_i+k_i$ if counting in padding area.
Now consider mapping the point $x_i$ on the ROI to the point $x_{i+1}$ on the feature map, which can be transformed to the layer area problem above. In particular, the receptive field formed by left-up corner and $x_i$ on the ROI can be mapped to the region formed by left-up corner and $x_{i+1}$ on the feature map. Based on the similar formula for the layer area problem above (note the only difference is that we only include left padding and up padding, and subtract the radius of kernel filter $(k_i-1)/2$,
The above coordinate system starts from 1. When the coordinate system starts from 0,
which can be simplified as
when $p_i=floor(k_i/2)$, $x_i=s_i x_{i+1}$ approximately, which is the simplest case.
By applying $x_i=s_i x_{i+1}+(\frac{k_i-1}{2}-p_i)$ recursively, we can achieve a general solution
in which $\alpha_L = \prod_{l=1}^{L-1} s_l$ and $\beta_L=\sum_{l=1}^{L-1} (\prod_{n=1}^{l-1} s_n)(\frac{k_l-1}{2}-p_l) $
Given two corner points of an anchor box on the feature map, we can find their corresponding points on the original image, which determine the ROI.
Distinguish generated fake images and real images in the freqency domain. [2]
An image can be composed of or decomposed into low-frequency part and high-frequency part [3] [8] [4] [10]
Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren, “Learning in the Frequency Domain”, CVPR, 2020.
Wang, Sheng-Yu, et al. “CNN-generated images are surprisingly easy to spot… for now.” arXiv preprint arXiv:1912.11035 (2019).
ayush Bansal, Yaser Sheikh, Deva Ramanan, “PixelNN: Example-based Image Synthesis”, ICLR 2018.
Yanchao Yang, Stefano Soatto, “FDA: Fourier Domain Adaptation for Semantic Segmentation”, CVPR 2020.
Roy, Hiya, et al. “Image inpainting using frequency domain priors.” arXiv preprint arXiv:2012.01832 (2020).
Shen, Xing, et al. “DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation.” arXiv preprint arXiv:2011.09876 (2020).
Suvorov, Roman, et al. “Resolution-robust Large Mask Inpainting with Fourier Convolutions.” WACV (2021).
Yu, Yingchen, et al. “WaveFill: A Wavelet-based Generation Network for Image Inpainting.” ICCV, 2021.
Mardani, Morteza, et al. “Neural ffts for universal texture image synthesis.” NeurIPS (2020).
Cai, Mu, et al. “Frequency domain image translation: More photo-realistic, better identity-preserving.” ICCV, 2021.