a) When the domain labels are known:
b) When the domain labels are unkown:
Align source feature map and target feature map: reduce H-divergence of regional feature map [4][8], cycle consistency [11]
Translation source domain image to target domain image: [5][6][10] (combine with target domain pseudo labels).
Training with pseudo labels on the target domain: Curriculumn learning [1] (global image label distribution and landmark superpixel label distribution); self-training [7]; use multiple models to vote for pseudo labels [9]
[1] Zhang, Yang, Philip David, and Boqing Gong. “Curriculum domain adaptation for semantic segmentation of urban scenes.” ICCV, 2017.
[2] Lee, Kuan-Hui, et al. “SPIGAN: Privileged Adversarial Learning from Simulation.” ICLR, 2019.
[3] Vu, Tuan-Hung, et al. “DADA: Depth-aware Domain Adaptation in Semantic Segmentation.” arXiv preprint arXiv:1904.01886 (2019).
[4] Chen, Yuhua, Wen Li, and Luc Van Gool. “Road: Reality oriented adaptation for semantic segmentation of urban scenes.” CVPR, 2018.
[5] Hoffman, Judy, et al. “Cycada: Cycle-consistent adversarial domain adaptation.” arXiv preprint arXiv:1711.03213 (2017).
[6] Sankaranarayanan, Swami, et al. “Learning from synthetic data: Addressing domain shift for semantic segmentation.” CVPR, 2018.
[7] Zou, Yang, et al. “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training.”, ECCV, 2018.
[8] Hong, Weixiang, et al. “Conditional generative adversarial network for structured domain adaptation.” CVPR, 2018.
[9] Zhang, Junting, Chen Liang, and C-C. Jay Kuo. “A fully convolutional tri-branch network (FCTN) for domain adaptation.” ICASSP, 2018.
[10] Li, Yunsheng, Lu Yuan, and Nuno Vasconcelos. “Bidirectional Learning for Domain Adaptation of Semantic Segmentation.” arXiv preprint arXiv:1904.10620 (2019).
[11] Kang, Guoliang, et al. “Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation.” Advances in Neural Information Processing Systems 33 (2020).
Methods
learn projection matrix: F(PXs, QXt)
sample selection: learn sample weights
domain-invariant and domain-specific components
low-rank reconstruction
pixel-level image to image translation
adversarial network [1]: classification and domain confusion. The domain separation and confusion problem, which is a min-max problem, can be solved like GAN or using reverse gradient (RevGrad) algorithm.
meta-learning
guided learning: tutor guides students and get feedback from students. ACM-MM18 paper
ensemble transfer learning: aggregate multiple transfer learning approaches [1]
Settings
open-set domain adaptation or partial transfer learning: [1][2][3]
distant domain adaptation (two domains are too distant, so the transfer between them relies on transition domains): Transitive transfer learning, distant domain transfer learning
open compound domain adaptation [1]
Domain adaptation for diverse applications
Domain difference metric: To measure data distribution mismatch, the most commonly used metric is MMD and its extensions such as fast MMD, conditional MMD [1][2] and joint MMD. There are also some other metrics like KL divergence, HSIC criterion, Bregman divergence, manifold criterion, and second-order statistic.
Theories: A summary of related theories
Survey:
The goal of Disentangled Representation [4] is to extract explanatory factors of the data in the input distribution and generate a more meaningful representation. disentangle codes/encodings/representations/latent factors/latent variables. single-dimension attribute encoding or multi-dimension attribute encoding.
A math definition of disentangled representation [11]
A survey on disentangled representation learning [19]
Unsupervised disentanglement
Recently, InfoGAN [5] utilizes GAN framework and maximizes the mutual information between a subset of the latent variables to learn disentangled representations in an unsupervised manner. Different latent variables are enforced to be independent based on the independence assumption [6].
Supervised disentanglement
Swapping attribute representation with the supervision of attribute annotation such as Dual Swap GAN [7] (semi-supervised) and DNA-GAN [8].
Disentangle representation for domain adaptation, disentangle representation into Class/domain-invariant and class/domain-specific: [9][10][12] [13]
close-form disentanglement [18]: after the model is trained, perform eigen decomposition to obtain orthogonal directions.
[1] Higgins, Irina, et al. “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework.” ICLR 2.5 (2017): 6.
[2] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.
[4] Representation learning: A review and new perspectives
[5] Infogan: Interpretable representation learning by information maximizing generative adversarial nets
[6] Learning Independent Features with adversarial Nets for Non-linear ICA
[8] DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images
[9] Image-to-image translation for cross-domain disentanglement
[10] Diverse Image-to-Image Translation via Disentangled Representations
[11] Higgins, Irina, et al. “Towards a definition of disentangled representations.” arXiv preprint arXiv:1812.02230 (2018).
[12] Gabbay, Aviv, and Yedid Hoshen. “Demystifying Inter-Class Disentanglement.” arXiv preprint arXiv:1906.11796 (2019).
[13] Hadad, Naama, Lior Wolf, and Moni Shahar. “A two-step disentanglement method.” CVPR, 2018.
[14] Shen, Zhiqiang, et al. “Towards instance-level image-to-image translation.” CVPR, 2019.
[15] Sangwoo Mo, Minsu Cho, Jinwoo Shin:
InstaGAN: Instance-aware Image-to-Image Translation. ICLR, 2019.
[16] Liu, Ming-Yu, et al. “Few-shot unsupervised image-to-image translation.” ICCV, 2019.
[17] Saito, Kuniaki, Kate Saenko, and Ming-Yu Liu. “COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder.” arXiv preprint arXiv:2007.07431 (2020).
[18] Shen, Yujun, and Bolei Zhou. “Closed-Form Factorization of Latent Semantics in GANs.” arXiv preprint arXiv:2007.06600 (2020).
[19] Xin Wang, Hong Chen, Siao Tang, Zihao Wu, and Wenwu Zhu. “Disentangled Representation Learning.”
[1] M. Afifi. Gender recognition and biometric identification using a large dataset of hand images. arXiv preprint arXiv:1711.04322, 2017.
[2] E. Dupont. Learning disentangled joint continuous and discrete representations. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 708–718. Curran Associates, Inc., 2018.
[3] Z. Feng, X. Wang, C. Ke, A.-X. Zeng, D. Tao, and M. Song. Dual swap disentangling. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5898–5908. Curran Associates, Inc., 2018.
[4] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
[5] A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio. Image-to-image translation for cross-domain disentanglement. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1294–1305. Curran Associates, Inc., 2018.
[6] N. Hadad, L. Wolf, and M. Shahar. A two-step disentanglement method. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[7] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision, pages 548–562. Springer, 2012.
[8] Q. Hu, A. Szab, T. Portenier, P. Favaro, and M. Zwicker. Disentangling factors of variation by mixing them. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[9] A. H. Jha, S. Anand, M. Singh, and V. Veeravasarapu. Disentangling factors of variation with cycle-consistent variational autoencoders. In The European Conference on Computer Vision (ECCV), September 2018.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[11] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang. Diverse image-to-image translation via disentangled representations. In The European Conference on Computer Vision (ECCV), September 2018.
[12] A. H. Liu, Y.-C. Liu, Y.-Y. Yeh, and Y.-C. F. Wang. A unified feature disentangler for multi-domain image translation and manipulation. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2595–2604. Curran Associates, Inc., 2018.
[13] Y. Liu, F. Wei, J. Shao, L. Sheng, J. Yan, and X. Wang. Exploring disentangled feature representation beyond face identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[14] Z. Liu, P. Luo, X.Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015.
[15] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 5040–5048. Curran Associates, Inc., 2016.
[16] S. Narayanaswamy, T. B. Paige, J.-W. van de Meent, A. Desmaison, N. Goodman, P. Kohli, F. Wood, and P. Torr. Learning disentangled representations with semi-supervised deep generative models. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5925–5935. Curran Associates, Inc., 2017.
[17] Z. Shu, M. Sahasrabudhe, R. Alp Guler, D. Samaras, N. Paragios, and I. Kokkinos. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In The European Conference on Computer Vision (ECCV), September 2018.
[18] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. Neural face editing with intrinsic image disentangling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[19] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint, 2017.
Concatenate or add it to encoder layer [2]
Concatenate or add it to the bottleneck [3]
Add to decoder layers using AdaIn [4] (without skip connection), [5] (with skip connection)
For concatenation or addition, spatially stack latent codes [9]
[1] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
[2] Toward Multimodal Image-to-Image Translation
[3] Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.
[4] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.
[5] High-Resolution Daytime Translation Without Domain Labels
[6] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).
[7] Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, “SinGAN: Learning a Generative Model from a Single Natural Image”, ICCV2019
[8] Lee, Hsin-Ying, et al. “Diverse image-to-image translation via disentangled representations.” ECCV, 2018.
[9] Yazeed Alharbi, Peter Wonka: Disentangled Image Generation Through Structured Noise Injection. CVPR, 2020.
[10] Park, Taesung, et al. “Semantic image synthesis with spatially-adaptive normalization.” CVPR, 2019.
[11] Sushko, Vadim, et al. “You only need adversarial supervision for semantic image synthesis.” ICLR, 2021
Ready-made DevBox:
Assemble: cheap, but no warranty
Nvidia
microarchitecture: maxwell->pascal->volta
GPU cloud
Some related papers: [1][2][3][4]
Pun, Chi Seng, Kelin Xia, and Si Xian Lee. “Persistent-Homology-based Machine Learning and its Applications—A Survey.” arXiv preprint arXiv:1811.00252 (2018).
Carlsson, Gunnar, and Rickard Brüel Gabrielsson. “Topological approaches to deep learning.” arXiv preprint arXiv:1811.01122 (2018).
Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and interpretation of the topology of neural networks.” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019.
Bergomi, Mattia G., et al. “Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning.” Nature Machine Intelligence 1.9 (2019): 423-433.
Learning from Massive Noisy Labeled Data for Image Classification: hidden variable is the label noise type
Expectation-Maximization Attention Networks for Semantic Segmentation: hidden variable is dictionary basis