Methods:

The goal of Disentangled Representation [4] is to extract explanatory factors of the data in the input distribution and generate a more meaningful representation. disentangle codes/encodings/representations/latent factors/latent variables. single-dimension attribute encoding or multi-dimension attribute encoding.

A math definition of disentangled representation [11]

A survey on disentangled representation learning [19]

  • Unsupervised disentanglement

    Recently, InfoGAN [5] utilizes GAN framework and maximizes the mutual information between a subset of the latent variables to learn disentangled representations in an unsupervised manner. Different latent variables are enforced to be independent based on the independence assumption [6].

  • Supervised disentanglement

    Swapping attribute representation with the supervision of attribute annotation such as Dual Swap GAN [7] (semi-supervised) and DNA-GAN [8].

  • Disentangle representation for domain adaptation, disentangle representation into Class/domain-invariant and class/domain-specific: [9][10][12] [13]

  • instance-level disentangle[14] [15] FUNIT[16] COCO-FUNIT[17

  • close-form disentanglement [18]: after the model is trained, perform eigen decomposition to obtain orthogonal directions.

Disentanglement metric:

  • disentangement metric score [1])
  • perceptual path length, linear separabilit [2]

Reference

[1] Higgins, Irina, et al. “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework.” ICLR 2.5 (2017): 6.

[2] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[4] Representation learning: A review and new perspectives

[5] Infogan: Interpretable representation learning by information maximizing generative adversarial nets

[6] Learning Independent Features with adversarial Nets for Non-linear ICA

[7] Dual Swap Disentangling

[8] DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images

[9] Image-to-image translation for cross-domain disentanglement

[10] Diverse Image-to-Image Translation via Disentangled Representations

[11] Higgins, Irina, et al. “Towards a definition of disentangled representations.” arXiv preprint arXiv:1812.02230 (2018).

[12] Gabbay, Aviv, and Yedid Hoshen. “Demystifying Inter-Class Disentanglement.” arXiv preprint arXiv:1906.11796 (2019).

[13] Hadad, Naama, Lior Wolf, and Moni Shahar. “A two-step disentanglement method.” CVPR, 2018.

[14] Shen, Zhiqiang, et al. “Towards instance-level image-to-image translation.” CVPR, 2019.

[15] Sangwoo Mo, Minsu Cho, Jinwoo Shin:
InstaGAN: Instance-aware Image-to-Image Translation. ICLR, 2019.

[16] Liu, Ming-Yu, et al. “Few-shot unsupervised image-to-image translation.” ICCV, 2019.

[17] Saito, Kuniaki, Kate Saenko, and Ming-Yu Liu. “COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder.” arXiv preprint arXiv:2007.07431 (2020).

[18] Shen, Yujun, and Bolei Zhou. “Closed-Form Factorization of Latent Semantics in GANs.” arXiv preprint arXiv:2007.06600 (2020).

[19] Xin Wang, Hong Chen, Siao Tang, Zihao Wu, and Wenwu Zhu. “Disentangled Representation Learning.”

  • CelebA [14] (dataset for human faces): [12, 2, 11, 17, 13, 8, 13, 18]
  • MNIST [10], MNIST-M [4] (digits): [16, 15, 12, 5, 2, 11, 9, 17, 6, 8, 13, 3]
  • Yosemite [19] (summer and winter scenes): [11]
  • Artworks [19] (Monet and Van Gogh): [11]
  • 2D Sprites (game characters): [15, 9, 6, 8, 3]
  • LineMod [7] (3D object): [9]
  • 11k Hands [1] (hand gestures): [17]

Reference

[1] M. Afifi. Gender recognition and biometric identification using a large dataset of hand images. arXiv preprint arXiv:1711.04322, 2017.

[2] E. Dupont. Learning disentangled joint continuous and discrete representations. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 708–718. Curran Associates, Inc., 2018.

[3] Z. Feng, X. Wang, C. Ke, A.-X. Zeng, D. Tao, and M. Song. Dual swap disentangling. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5898–5908. Curran Associates, Inc., 2018.

[4] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.

[5] A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio. Image-to-image translation for cross-domain disentanglement. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1294–1305. Curran Associates, Inc., 2018.

[6] N. Hadad, L. Wolf, and M. Shahar. A two-step disentanglement method. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[7] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision, pages 548–562. Springer, 2012.

[8] Q. Hu, A. Szab, T. Portenier, P. Favaro, and M. Zwicker. Disentangling factors of variation by mixing them. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[9] A. H. Jha, S. Anand, M. Singh, and V. Veeravasarapu. Disentangling factors of variation with cycle-consistent variational autoencoders. In The European Conference on Computer Vision (ECCV), September 2018.

[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[11] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang. Diverse image-to-image translation via disentangled representations. In The European Conference on Computer Vision (ECCV), September 2018.

[12] A. H. Liu, Y.-C. Liu, Y.-Y. Yeh, and Y.-C. F. Wang. A unified feature disentangler for multi-domain image translation and manipulation. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2595–2604. Curran Associates, Inc., 2018.

[13] Y. Liu, F. Wei, J. Shao, L. Sheng, J. Yan, and X. Wang. Exploring disentangled feature representation beyond face identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[14] Z. Liu, P. Luo, X.Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015.

[15] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 5040–5048. Curran Associates, Inc., 2016.

[16] S. Narayanaswamy, T. B. Paige, J.-W. van de Meent, A. Desmaison, N. Goodman, P. Kohli, F. Wood, and P. Torr. Learning disentangled representations with semi-supervised deep generative models. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5925–5935. Curran Associates, Inc., 2017.

[17] Z. Shu, M. Sahasrabudhe, R. Alp Guler, D. Samaras, N. Paragios, and I. Kokkinos. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In The European Conference on Computer Vision (ECCV), September 2018.

[18] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. Neural face editing with intrinsic image disentangling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

[19] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint, 2017.

Inject a latent vector into network:

  1. Concatenate or add it to the input image [1] [7]

  2. Concatenate or add it to encoder layer [2]

  3. Concatenate or add it to the bottleneck [3]

  4. Concatenate or add it to decoder layer [6] [8]

  5. Add to decoder layers using AdaIn [4] (without skip connection), [5] (with skip connection)

Inject a latent map into network:

  1. For concatenation or addition, spatially stack latent codes [9]

  2. Spatially adaptive AdaIn: SPADE [10], OASIS [11]

Reference

[1] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[2] Toward Multimodal Image-to-Image Translation

[3] Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.

[4] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[5] High-Resolution Daytime Translation Without Domain Labels

[6] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).

[7] Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, “SinGAN: Learning a Generative Model from a Single Natural Image”, ICCV2019

[8] Lee, Hsin-Ying, et al. “Diverse image-to-image translation via disentangled representations.” ECCV, 2018.

[9] Yazeed Alharbi, Peter Wonka: Disentangled Image Generation Through Structured Noise Injection. CVPR, 2020.

[10] Park, Taesung, et al. “Semantic image synthesis with spatially-adaptive normalization.” CVPR, 2019.

[11] Sushko, Vadim, et al. “You only need adversarial supervision for semantic image synthesis.” ICLR, 2021

Some related papers: [1][2][3][4]

Reference

  1. Pun, Chi Seng, Kelin Xia, and Si Xian Lee. “Persistent-Homology-based Machine Learning and its Applications—A Survey.” arXiv preprint arXiv:1811.00252 (2018).

  2. Carlsson, Gunnar, and Rickard Brüel Gabrielsson. “Topological approaches to deep learning.” arXiv preprint arXiv:1811.01122 (2018).

  3. Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and interpretation of the topology of neural networks.” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019.

  4. Bergomi, Mattia G., et al. “Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning.” Nature Machine Intelligence 1.9 (2019): 423-433.

a) data selection, coreset selection, dataset pruning: select a subset of training data

  • some papers of dataset pruning for generative model:

      * Li, Yize, et al. "Pruning then reweighting: Towards data-efficient training of diffusion models." ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025.
    
      * Moser, Brian B., Federico Raue, and Andreas Dengel. "A study in dataset pruning for image super-resolution." International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2024.
    
  • dataset quantization: divide the training set into different bins and select representative samples in each bin. It could be used as a data selection strategy.

  • data attribution: survey on data attribution: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5451054. It could be used as a measurement for data selection. Some papers of data attribution for generative model:

      * Georgiev, Kristian, et al. "The journey, not the destination: How data guides diffusion models." arXiv preprint arXiv:2312.06205 (2023).
      * Zheng, Xiaosen, et al. "Intriguing properties of data attribution on diffusion models." arXiv preprint arXiv:2311.00500 (2023).
      * Lin, Jinxu, et al. "Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models." arXiv preprint arXiv:2410.18639 (2024).
    

b) dataset distillation: optimize the training set, the optimized training images are not realistic images. There is no work using distilled images to train generative model.

a) clustering: select representative samples and remove outliers. clustering based on loss, gradient, etc.

b) data contribution: measure the contribution of each sample

  • The performance difference using or without using this sample

c) learn the weights of training samples: train with weighted loss and test on the validation test

Global color mapping:

  • 3D LUT: [1], [2] non-uniform LUT
  • curve function: [1]
  • linear transformation: [1]

Local color mapping:

  • 3D LUT: [1]
  • curve function: [1], DCE[2]
  • linear transformation: HDRNet[1]
0%