Domain Translation

Posted on 2022-06-16 | In paper note

method	supervised	multi-domain	multi-modal
pix2pix [1]	yes	no	no
BicycleGAN [6], [7]	yes	no	yes
[10]	yes	yes	yes
cycleGAN [2], UNIT [3]	no	no	no
MUNIT [4], AugCGAN [5]	no	no	yes
starGAN [8], [9], [11], [12], ComboGAN [13], [14]	no	yes	no
SMIT[15], DRIT++[16], starGANv2[19]	no	yes	yes

Exemplar-guided domain translation: use an exemplar to define the target domain [17] [18]

Reference

[1] Image-to-Image Translation with Conditional Adversarial Networks

[2] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

[3] Unsupervised image-to-image translation networks

[4] Multimodal Unsupervised Image-to-Image Translation

[5] Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

[6] Toward Multimodal Image-to-Image Translation

[7] Image-to-image translation for cross-domain disentanglement

[8] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[9] Unsupervised Multi-Domain Image Translation with Domain-specific Encoders/Decoders

[10] Multi-view image Generation from a single-view

[11] Show, Attend and Translate- Unpaired Multi-Domain Image-to-Image Translation with Visual Attention

[12] Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[13] ComboGAN: Unrestrained Scalability for Image Domain Translation

[14] A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

[15] SMIT: Stochastic Multi-Label Image-to-Image Translation

[16] DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

[17] Cross-domain Correspondence Learning for Exemplar-based Image Translation

[18] High-Resolution Daytime Translation Without Domain Labels

[19] StarGAN v2: Diverse Image Synthesis for Multiple Domains

Domain Generalization

Posted on 2022-06-16 | In paper note

a) When the domain labels are known:

reduce the distance between different domains: MMD [1][2], mutual information
domain-invariant and domain-specific components: [1][2]

b) When the domain labels are unkown:

first discover multiple latent domains: cluster [1][2], max margin separation [1]

Domain Adaptative Segmentation

Posted on 2022-06-16 | In paper note

Domain adaptation:

Align source feature map and target feature map: reduce H-divergence of regional feature map [4][8], cycle consistency [11]
Translation source domain image to target domain image: [5][6][10] (combine with target domain pseudo labels).
Training with pseudo labels on the target domain: Curriculumn learning [1] (global image label distribution and landmark superpixel label distribution); self-training [7]; use multiple models to vote for pseudo labels [9]

Domain adaptation with privileged information:

Domain adaptation with privileged information like depth: SPIGAN [2] (enforce synthetic image and generated image to predict the same depth), [3] (adversarial learning on depth)

Reference

[1] Zhang, Yang, Philip David, and Boqing Gong. “Curriculum domain adaptation for semantic segmentation of urban scenes.” ICCV, 2017.

[2] Lee, Kuan-Hui, et al. “SPIGAN: Privileged Adversarial Learning from Simulation.” ICLR, 2019.

[3] Vu, Tuan-Hung, et al. “DADA: Depth-aware Domain Adaptation in Semantic Segmentation.” arXiv preprint arXiv:1904.01886 (2019).

[4] Chen, Yuhua, Wen Li, and Luc Van Gool. “Road: Reality oriented adaptation for semantic segmentation of urban scenes.” CVPR, 2018.

[5] Hoffman, Judy, et al. “Cycada: Cycle-consistent adversarial domain adaptation.” arXiv preprint arXiv:1711.03213 (2017).

[6] Sankaranarayanan, Swami, et al. “Learning from synthetic data: Addressing domain shift for semantic segmentation.” CVPR, 2018.

[7] Zou, Yang, et al. “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training.”, ECCV, 2018.

[8] Hong, Weixiang, et al. “Conditional generative adversarial network for structured domain adaptation.” CVPR, 2018.

[9] Zhang, Junting, Chen Liang, and C-C. Jay Kuo. “A fully convolutional tri-branch network (FCTN) for domain adaptation.” ICASSP, 2018.

[10] Li, Yunsheng, Lu Yuan, and Nuno Vasconcelos. “Bidirectional Learning for Domain Adaptation of Semantic Segmentation.” arXiv preprint arXiv:1904.10620 (2019).

[11] Kang, Guoliang, et al. “Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation.” Advances in Neural Information Processing Systems 33 (2020).

Domain Adaptation

Posted on 2022-06-16 | In paper note

Methods

learn projection matrix: F(PXs, QXt)
- project to common subspace
  - TCA [pdf]
  - SA [pdf]
  - LSSA: extension of SA [pdf]
  - DIP [pdf]
  - CORAL [pdf]
  - deep CORAL [pdf]
  - other deep feature-based methods [1] [2]
- interpolation on the manifold
  - SGF [pdf]
  - GFK [pdf]
sample selection: learn sample weights
- KMM [pdf]
- STM [pdf]
- DASVM [pdf]
- weighted adversarial network [1][2]
domain-invariant and domain-specific components
- SDDL [pdf]
- Domain Separation Network [pdf]
- low-rank DL [pdf]
low-rank reconstruction
- LTSL [pdf]
- RDALR [pdf]
pixel-level image to image translation
- paired input: conditional GAN [pdf]
- unpaired input: cycling GAN [pdf], GAN with content-similarity loss [pdf], UNIT [pdf]
- combine with feature-based method: GraspGAN [pdf]
- A unified framework [pdf]
adversarial network [1]: classification and domain confusion. The domain separation and confusion problem, which is a min-max problem, can be solved like GAN or using reverse gradient (RevGrad) algorithm.
meta-learning
- gradients on two domains should be consistent [pdf]
domain alignment layer (batch normalization): [1] [2]
guided learning: tutor guides students and get feedback from students. ACM-MM18 paper
ensemble transfer learning: aggregate multiple transfer learning approaches [1]

Settings

open-set domain adaptation or partial transfer learning: [1][2][3]
distant domain adaptation (two domains are too distant, so the transfer between them relies on transition domains): Transitive transfer learning, distant domain transfer learning
open compound domain adaptation [1]

Domain adaptation for diverse applications

pose estimation [1]
person re-identification [1]
objection detection [1]
segmentation [1]
VQA [1]

Domain difference metric: To measure data distribution mismatch, the most commonly used metric is MMD and its extensions such as fast MMD, conditional MMD [1][2] and joint MMD. There are also some other metrics like KL divergence, HSIC criterion, Bregman divergence, manifold criterion, and second-order statistic.

Theories: A summary of related theories

Survey:

An old survey of transfer learning [pdf]
Recent advance on domain adaptation [pdf]
My survey of old deep learning domain adaptation methods [pdf]
A Chinese version of transfer learning tutorial [pdf]
Datasets and code: [1]
A Comprehensive Survey on Transfer Learning [pdf]

Disentangled Representation

Posted on 2022-06-16 | In paper note

Methods:

The goal of Disentangled Representation [4] is to extract explanatory factors of the data in the input distribution and generate a more meaningful representation. disentangle codes/encodings/representations/latent factors/latent variables. single-dimension attribute encoding or multi-dimension attribute encoding.

A math definition of disentangled representation [11]

A survey on disentangled representation learning [19]

Unsupervised disentanglement

Recently, InfoGAN [5] utilizes GAN framework and maximizes the mutual information between a subset of the latent variables to learn disentangled representations in an unsupervised manner. Different latent variables are enforced to be independent based on the independence assumption [6].
Supervised disentanglement

Swapping attribute representation with the supervision of attribute annotation such as Dual Swap GAN [7] (semi-supervised) and DNA-GAN [8].
Disentangle representation for domain adaptation, disentangle representation into Class/domain-invariant and class/domain-specific: [9][10][12] [13]
instance-level disentangle[14] [15] FUNIT[16] COCO-FUNIT[17
close-form disentanglement [18]: after the model is trained, perform eigen decomposition to obtain orthogonal directions.

Disentanglement metric:

disentangement metric score [1])
perceptual path length, linear separabilit [2]

Reference

[1] Higgins, Irina, et al. “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework.” ICLR 2.5 (2017): 6.

[2] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[4] Representation learning: A review and new perspectives

[5] Infogan: Interpretable representation learning by information maximizing generative adversarial nets

[6] Learning Independent Features with adversarial Nets for Non-linear ICA

[7] Dual Swap Disentangling

[8] DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images

[9] Image-to-image translation for cross-domain disentanglement

[10] Diverse Image-to-Image Translation via Disentangled Representations

[11] Higgins, Irina, et al. “Towards a definition of disentangled representations.” arXiv preprint arXiv:1812.02230 (2018).

[12] Gabbay, Aviv, and Yedid Hoshen. “Demystifying Inter-Class Disentanglement.” arXiv preprint arXiv:1906.11796 (2019).

[13] Hadad, Naama, Lior Wolf, and Moni Shahar. “A two-step disentanglement method.” CVPR, 2018.

[14] Shen, Zhiqiang, et al. “Towards instance-level image-to-image translation.” CVPR, 2019.

[15] Sangwoo Mo, Minsu Cho, Jinwoo Shin:
InstaGAN: Instance-aware Image-to-Image Translation. ICLR, 2019.

[16] Liu, Ming-Yu, et al. “Few-shot unsupervised image-to-image translation.” ICCV, 2019.

[17] Saito, Kuniaki, Kate Saenko, and Ming-Yu Liu. “COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder.” arXiv preprint arXiv:2007.07431 (2020).

[18] Shen, Yujun, and Bolei Zhou. “Closed-Form Factorization of Latent Semantics in GANs.” arXiv preprint arXiv:2007.06600 (2020).

[19] Xin Wang, Hong Chen, Siao Tang, Zihao Wu, and Wenwu Zhu. “Disentangled Representation Learning.”

Disentangle Datasets

Posted on 2022-06-16 | In paper note

CelebA [14] (dataset for human faces): [12, 2, 11, 17, 13, 8, 13, 18]
MNIST [10], MNIST-M [4] (digits): [16, 15, 12, 5, 2, 11, 9, 17, 6, 8, 13, 3]
Yosemite [19] (summer and winter scenes): [11]
Artworks [19] (Monet and Van Gogh): [11]
2D Sprites (game characters): [15, 9, 6, 8, 3]
LineMod [7] (3D object): [9]
11k Hands [1] (hand gestures): [17]

Reference

[1] M. Afifi. Gender recognition and biometric identification using a large dataset of hand images. arXiv preprint arXiv:1711.04322, 2017.

[2] E. Dupont. Learning disentangled joint continuous and discrete representations. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 708–718. Curran Associates, Inc., 2018.

[3] Z. Feng, X. Wang, C. Ke, A.-X. Zeng, D. Tao, and M. Song. Dual swap disentangling. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5898–5908. Curran Associates, Inc., 2018.

[4] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.

[5] A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio. Image-to-image translation for cross-domain disentanglement. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1294–1305. Curran Associates, Inc., 2018.

[6] N. Hadad, L. Wolf, and M. Shahar. A two-step disentanglement method. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[7] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision, pages 548–562. Springer, 2012.

[8] Q. Hu, A. Szab, T. Portenier, P. Favaro, and M. Zwicker. Disentangling factors of variation by mixing them. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[9] A. H. Jha, S. Anand, M. Singh, and V. Veeravasarapu. Disentangling factors of variation with cycle-consistent variational autoencoders. In The European Conference on Computer Vision (ECCV), September 2018.

[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[11] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang. Diverse image-to-image translation via disentangled representations. In The European Conference on Computer Vision (ECCV), September 2018.

[12] A. H. Liu, Y.-C. Liu, Y.-Y. Yeh, and Y.-C. F. Wang. A unified feature disentangler for multi-domain image translation and manipulation. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2595–2604. Curran Associates, Inc., 2018.

[13] Y. Liu, F. Wei, J. Shao, L. Sheng, J. Yan, and X. Wang. Exploring disentangled feature representation beyond face identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[14] Z. Liu, P. Luo, X.Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015.

[15] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 5040–5048. Curran Associates, Inc., 2016.

[16] S. Narayanaswamy, T. B. Paige, J.-W. van de Meent, A. Desmaison, N. Goodman, P. Kohli, F. Wood, and P. Torr. Learning disentangled representations with semi-supervised deep generative models. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5925–5935. Curran Associates, Inc., 2017.

[17] Z. Shu, M. Sahasrabudhe, R. Alp Guler, D. Samaras, N. Paragios, and I. Kokkinos. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In The European Conference on Computer Vision (ECCV), September 2018.

[18] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. Neural face editing with intrinsic image disentangling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

[19] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint, 2017.

Differential Programming

Posted on 2022-06-16 | In paper note

The first work of differential programming is using network to approximate sparse coding. [pdf]

Introduction tutorials: [1] [2]

Different Ways of Injecting Latent Code

Posted on 2022-06-16 | In paper note

Inject a latent vector into network:

Concatenate or add it to the input image [1] [7]
Concatenate or add it to encoder layer [2]
Concatenate or add it to the bottleneck [3]
Concatenate or add it to decoder layer [6] [8]
Add to decoder layers using AdaIn [4] (without skip connection), [5] (with skip connection)

Inject a latent map into network:

For concatenation or addition, spatially stack latent codes [9]
Spatially adaptive AdaIn: SPADE [10], OASIS [11]

Reference

[1] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[2] Toward Multimodal Image-to-Image Translation

[3] Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.

[4] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[5] High-Resolution Daytime Translation Without Domain Labels

[6] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).

[7] Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, “SinGAN: Learning a Generative Model from a Single Natural Image”, ICCV2019

[8] Lee, Hsin-Ying, et al. “Diverse image-to-image translation via disentangled representations.” ECCV, 2018.

[9] Yazeed Alharbi, Peter Wonka: Disentangled Image Generation Through Structured Noise Injection. CVPR, 2020.

[10] Park, Taesung, et al. “Semantic image synthesis with spatially-adaptive normalization.” CVPR, 2019.

[11] Sushko, Vadim, et al. “You only need adversarial supervision for semantic image synthesis.” ICLR, 2021

Deep Learning Platform

Posted on 2022-06-16 | In paper note

Ready-made DevBox:
- Dell Alienware: at most 2 GPUs
- newegg: 4 GPUs
- Lambda Labs: 4 GPUs
Assemble: cheap, but no warranty
- part list: most things are out of date. Tom’s hardware is a good website for comparison.
Nvidia
- microarchitecture: maxwell->pascal->volta
- DGX-systems
GPU cloud

Deep Feature Invariance

Posted on 2022-06-16 | In paper note

Some related papers: [1][2][3][4]

Reference

Pun, Chi Seng, Kelin Xia, and Si Xian Lee. “Persistent-Homology-based Machine Learning and its Applications—A Survey.” arXiv preprint arXiv:1811.00252 (2018).
Carlsson, Gunnar, and Rickard Brüel Gabrielsson. “Topological approaches to deep learning.” arXiv preprint arXiv:1811.01122 (2018).
Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and interpretation of the topology of neural networks.” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019.
Bergomi, Mattia G., et al. “Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning.” Nature Machine Intelligence 1.9 (2019): 423-433.