Newly Blog

Domain Adaptation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Methods

learn projection matrix: F(PXs, QXt)
- project to common subspace
  - TCA [pdf]
  - SA [pdf]
  - LSSA: extension of SA [pdf]
  - DIP [pdf]
  - CORAL [pdf]
  - deep CORAL [pdf]
  - other deep feature-based methods [1] [2]
- interpolation on the manifold
  - SGF [pdf]
  - GFK [pdf]
sample selection: learn sample weights
- KMM [pdf]
- STM [pdf]
- DASVM [pdf]
- weighted adversarial network [1][2]
domain-invariant and domain-specific components
- SDDL [pdf]
- Domain Separation Network [pdf]
- low-rank DL [pdf]
low-rank reconstruction
- LTSL [pdf]
- RDALR [pdf]
pixel-level image to image translation
- paired input: conditional GAN [pdf]
- unpaired input: cycling GAN [pdf], GAN with content-similarity loss [pdf], UNIT [pdf]
- combine with feature-based method: GraspGAN [pdf]
- A unified framework [pdf]
adversarial network [1]: classification and domain confusion. The domain separation and confusion problem, which is a min-max problem, can be solved like GAN or using reverse gradient (RevGrad) algorithm.
meta-learning
- gradients on two domains should be consistent [pdf]
domain alignment layer (batch normalization): [1] [2]
guided learning: tutor guides students and get feedback from students. ACM-MM18 paper
ensemble transfer learning: aggregate multiple transfer learning approaches [1]

Settings

open-set domain adaptation or partial transfer learning: [1][2][3]
distant domain adaptation (two domains are too distant, so the transfer between them relies on transition domains): Transitive transfer learning, distant domain transfer learning
open compound domain adaptation [1]

Domain adaptation for diverse applications

pose estimation [1]
person re-identification [1]
objection detection [1]
segmentation [1]
VQA [1]

Domain difference metric: To measure data distribution mismatch, the most commonly used metric is MMD and its extensions such as fast MMD, conditional MMD [1][2] and joint MMD. There are also some other metrics like KL divergence, HSIC criterion, Bregman divergence, manifold criterion, and second-order statistic.

Theories: A summary of related theories

Survey:

An old survey of transfer learning [pdf]
Recent advance on domain adaptation [pdf]
My survey of old deep learning domain adaptation methods [pdf]
A Chinese version of transfer learning tutorial [pdf]
Datasets and code: [1]
A Comprehensive Survey on Transfer Learning [pdf]

Disentangled Representation

Posted on 2026-03-17 Edited on 2023-06-13 In paper note

Methods:

The goal of Disentangled Representation [4] is to extract explanatory factors of the data in the input distribution and generate a more meaningful representation. disentangle codes/encodings/representations/latent factors/latent variables. single-dimension attribute encoding or multi-dimension attribute encoding.

A math definition of disentangled representation [11]

A survey on disentangled representation learning [19]

Unsupervised disentanglement

Recently, InfoGAN [5] utilizes GAN framework and maximizes the mutual information between a subset of the latent variables to learn disentangled representations in an unsupervised manner. Different latent variables are enforced to be independent based on the independence assumption [6].
Supervised disentanglement

Swapping attribute representation with the supervision of attribute annotation such as Dual Swap GAN [7] (semi-supervised) and DNA-GAN [8].
Disentangle representation for domain adaptation, disentangle representation into Class/domain-invariant and class/domain-specific: [9][10][12] [13]
instance-level disentangle[14] [15] FUNIT[16] COCO-FUNIT[17
close-form disentanglement [18]: after the model is trained, perform eigen decomposition to obtain orthogonal directions.

Disentanglement metric:

disentangement metric score [1])
perceptual path length, linear separabilit [2]

Reference

[1] Higgins, Irina, et al. “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework.” ICLR 2.5 (2017): 6.

[2] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[4] Representation learning: A review and new perspectives

[5] Infogan: Interpretable representation learning by information maximizing generative adversarial nets

[6] Learning Independent Features with adversarial Nets for Non-linear ICA

[7] Dual Swap Disentangling

[8] DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images

[9] Image-to-image translation for cross-domain disentanglement

[10] Diverse Image-to-Image Translation via Disentangled Representations

[11] Higgins, Irina, et al. “Towards a definition of disentangled representations.” arXiv preprint arXiv:1812.02230 (2018).

[12] Gabbay, Aviv, and Yedid Hoshen. “Demystifying Inter-Class Disentanglement.” arXiv preprint arXiv:1906.11796 (2019).

[13] Hadad, Naama, Lior Wolf, and Moni Shahar. “A two-step disentanglement method.” CVPR, 2018.

[14] Shen, Zhiqiang, et al. “Towards instance-level image-to-image translation.” CVPR, 2019.

[15] Sangwoo Mo, Minsu Cho, Jinwoo Shin:
InstaGAN: Instance-aware Image-to-Image Translation. ICLR, 2019.

[16] Liu, Ming-Yu, et al. “Few-shot unsupervised image-to-image translation.” ICCV, 2019.

[17] Saito, Kuniaki, Kate Saenko, and Ming-Yu Liu. “COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder.” arXiv preprint arXiv:2007.07431 (2020).

[18] Shen, Yujun, and Bolei Zhou. “Closed-Form Factorization of Latent Semantics in GANs.” arXiv preprint arXiv:2007.06600 (2020).

[19] Xin Wang, Hong Chen, Siao Tang, Zihao Wu, and Wenwu Zhu. “Disentangled Representation Learning.”

Disentangle Datasets

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

CelebA [14] (dataset for human faces): [12, 2, 11, 17, 13, 8, 13, 18]
MNIST [10], MNIST-M [4] (digits): [16, 15, 12, 5, 2, 11, 9, 17, 6, 8, 13, 3]
Yosemite [19] (summer and winter scenes): [11]
Artworks [19] (Monet and Van Gogh): [11]
2D Sprites (game characters): [15, 9, 6, 8, 3]
LineMod [7] (3D object): [9]
11k Hands [1] (hand gestures): [17]

Reference

[1] M. Afifi. Gender recognition and biometric identification using a large dataset of hand images. arXiv preprint arXiv:1711.04322, 2017.

[2] E. Dupont. Learning disentangled joint continuous and discrete representations. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 708–718. Curran Associates, Inc., 2018.

[3] Z. Feng, X. Wang, C. Ke, A.-X. Zeng, D. Tao, and M. Song. Dual swap disentangling. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5898–5908. Curran Associates, Inc., 2018.

[4] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.

[5] A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio. Image-to-image translation for cross-domain disentanglement. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1294–1305. Curran Associates, Inc., 2018.

[6] N. Hadad, L. Wolf, and M. Shahar. A two-step disentanglement method. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[7] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision, pages 548–562. Springer, 2012.

[8] Q. Hu, A. Szab, T. Portenier, P. Favaro, and M. Zwicker. Disentangling factors of variation by mixing them. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[9] A. H. Jha, S. Anand, M. Singh, and V. Veeravasarapu. Disentangling factors of variation with cycle-consistent variational autoencoders. In The European Conference on Computer Vision (ECCV), September 2018.

[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[11] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang. Diverse image-to-image translation via disentangled representations. In The European Conference on Computer Vision (ECCV), September 2018.

[12] A. H. Liu, Y.-C. Liu, Y.-Y. Yeh, and Y.-C. F. Wang. A unified feature disentangler for multi-domain image translation and manipulation. In S. Bengio, H.Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2595–2604. Curran Associates, Inc., 2018.

[13] Y. Liu, F. Wei, J. Shao, L. Sheng, J. Yan, and X. Wang. Exploring disentangled feature representation beyond face identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[14] Z. Liu, P. Luo, X.Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015.

[15] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 5040–5048. Curran Associates, Inc., 2016.

[16] S. Narayanaswamy, T. B. Paige, J.-W. van de Meent, A. Desmaison, N. Goodman, P. Kohli, F. Wood, and P. Torr. Learning disentangled representations with semi-supervised deep generative models. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5925–5935. Curran Associates, Inc., 2017.

[17] Z. Shu, M. Sahasrabudhe, R. Alp Guler, D. Samaras, N. Paragios, and I. Kokkinos. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In The European Conference on Computer Vision (ECCV), September 2018.

[18] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. Neural face editing with intrinsic image disentangling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

[19] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint, 2017.

Differential Programming

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

The first work of differential programming is using network to approximate sparse coding. [pdf]

Introduction tutorials: [1] [2]

Different Ways of Injecting Latent Code

Posted on 2026-03-17 Edited on 2022-10-18 In paper note

Inject a latent vector into network:

Concatenate or add it to the input image [1] [7]
Concatenate or add it to encoder layer [2]
Concatenate or add it to the bottleneck [3]
Concatenate or add it to decoder layer [6] [8]
Add to decoder layers using AdaIn [4] (without skip connection), [5] (with skip connection)

Inject a latent map into network:

For concatenation or addition, spatially stack latent codes [9]
Spatially adaptive AdaIn: SPADE [10], OASIS [11]

Reference

[1] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[2] Toward Multimodal Image-to-Image Translation

[3] Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.

[4] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[5] High-Resolution Daytime Translation Without Domain Labels

[6] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).

[7] Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, “SinGAN: Learning a Generative Model from a Single Natural Image”, ICCV2019

[8] Lee, Hsin-Ying, et al. “Diverse image-to-image translation via disentangled representations.” ECCV, 2018.

[9] Yazeed Alharbi, Peter Wonka: Disentangled Image Generation Through Structured Noise Injection. CVPR, 2020.

[10] Park, Taesung, et al. “Semantic image synthesis with spatially-adaptive normalization.” CVPR, 2019.

[11] Sushko, Vadim, et al. “You only need adversarial supervision for semantic image synthesis.” ICLR, 2021

Deep Learning Platform

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Ready-made DevBox:
- Dell Alienware: at most 2 GPUs
- newegg: 4 GPUs
- Lambda Labs: 4 GPUs
Assemble: cheap, but no warranty
- part list: most things are out of date. Tom’s hardware is a good website for comparison.
Nvidia
- microarchitecture: maxwell->pascal->volta
- DGX-systems
GPU cloud

Deep Feature Invariance

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Some related papers: [1][2][3][4]

Reference

Pun, Chi Seng, Kelin Xia, and Si Xian Lee. “Persistent-Homology-based Machine Learning and its Applications—A Survey.” arXiv preprint arXiv:1811.00252 (2018).
Carlsson, Gunnar, and Rickard Brüel Gabrielsson. “Topological approaches to deep learning.” arXiv preprint arXiv:1811.01122 (2018).
Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and interpretation of the topology of neural networks.” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019.
Bergomi, Mattia G., et al. “Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning.” Nature Machine Intelligence 1.9 (2019): 423-433.

Deep EM

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Learning from Massive Noisy Labeled Data for Image Classification: hidden variable is the label noise type
Expectation-Maximization Attention Networks for Semantic Segmentation: hidden variable is dictionary basis

Dataset Pruning

Posted on 2026-03-17 Edited on 2025-12-28 In paper note

a) data selection, coreset selection, dataset pruning: select a subset of training data

survey on coreset selection: https://arxiv.org/pdf/2505.17799

some papers of dataset pruning for generative model:

  * Li, Yize, et al. "Pruning then reweighting: Towards data-efficient training of diffusion models." ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025.

  * Moser, Brian B., Federico Raue, and Andreas Dengel. "A study in dataset pruning for image super-resolution." International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2024.

dataset quantization: divide the training set into different bins and select representative samples in each bin. It could be used as a data selection strategy.

data attribution: survey on data attribution: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5451054. It could be used as a measurement for data selection. Some papers of data attribution for generative model:

  * Georgiev, Kristian, et al. "The journey, not the destination: How data guides diffusion models." arXiv preprint arXiv:2312.06205 (2023).
  * Zheng, Xiaosen, et al. "Intriguing properties of data attribution on diffusion models." arXiv preprint arXiv:2311.00500 (2023).
  * Lin, Jinxu, et al. "Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models." arXiv preprint arXiv:2410.18639 (2024).

b) dataset distillation: optimize the training set, the optimized training images are not realistic images. There is no work using distilled images to train generative model.

Dataset Selection

Posted on 2026-03-17 Edited on 2025-12-28 In paper note

a) clustering: select representative samples and remove outliers. clustering based on loss, gradient, etc.

b) data contribution: measure the contribution of each sample

The performance difference using or without using this sample

c) learn the weights of training samples: train with weighted loss and test on the validation test