Different Ways of Injecting Latent Code
Inject a latent vector into network:
Concatenate or add it to encoder layer [2]
Concatenate or add it to the bottleneck [3]
Add to decoder layers using AdaIn [4] (without skip connection), [5] (with skip connection)
Inject a latent map into network:
For concatenation or addition, spatially stack latent codes [9]
Reference
[1] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
[2] Toward Multimodal Image-to-Image Translation
[3] Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.
[4] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.
[5] High-Resolution Daytime Translation Without Domain Labels
[6] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).
[7] Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, “SinGAN: Learning a Generative Model from a Single Natural Image”, ICCV2019
[8] Lee, Hsin-Ying, et al. “Diverse image-to-image translation via disentangled representations.” ECCV, 2018.
[9] Yazeed Alharbi, Peter Wonka: Disentangled Image Generation Through Structured Noise Injection. CVPR, 2020.
[10] Park, Taesung, et al. “Semantic image synthesis with spatially-adaptive normalization.” CVPR, 2019.
[11] Sushko, Vadim, et al. “You only need adversarial supervision for semantic image synthesis.” ICLR, 2021
Deep Learning Platform
Ready-made DevBox:
- Dell Alienware: at most 2 GPUs
- newegg: 4 GPUs
- Lambda Labs: 4 GPUs
Assemble: cheap, but no warranty
- part list: most things are out of date. Tom’s hardware is a good website for comparison.
Nvidia
microarchitecture: maxwell->pascal->volta
GPU cloud
Deep Feature Invariance
Some related papers: [1][2][3][4]
Reference
Pun, Chi Seng, Kelin Xia, and Si Xian Lee. “Persistent-Homology-based Machine Learning and its Applications—A Survey.” arXiv preprint arXiv:1811.00252 (2018).
Carlsson, Gunnar, and Rickard Brüel Gabrielsson. “Topological approaches to deep learning.” arXiv preprint arXiv:1811.01122 (2018).
Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and interpretation of the topology of neural networks.” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019.
Bergomi, Mattia G., et al. “Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning.” Nature Machine Intelligence 1.9 (2019): 423-433.
Deep EM
Learning from Massive Noisy Labeled Data for Image Classification: hidden variable is the label noise type
Expectation-Maximization Attention Networks for Semantic Segmentation: hidden variable is dictionary basis
Dataset Pruning
a) data selection, coreset selection, dataset pruning: select a subset of training data
- survey on coreset selection: https://arxiv.org/pdf/2505.17799
some papers of dataset pruning for generative model:
* Li, Yize, et al. "Pruning then reweighting: Towards data-efficient training of diffusion models." ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. * Moser, Brian B., Federico Raue, and Andreas Dengel. "A study in dataset pruning for image super-resolution." International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2024.dataset quantization: divide the training set into different bins and select representative samples in each bin. It could be used as a data selection strategy.
data attribution: survey on data attribution: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5451054. It could be used as a measurement for data selection. Some papers of data attribution for generative model:
* Georgiev, Kristian, et al. "The journey, not the destination: How data guides diffusion models." arXiv preprint arXiv:2312.06205 (2023). * Zheng, Xiaosen, et al. "Intriguing properties of data attribution on diffusion models." arXiv preprint arXiv:2311.00500 (2023). * Lin, Jinxu, et al. "Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models." arXiv preprint arXiv:2410.18639 (2024).
b) dataset distillation: optimize the training set, the optimized training images are not realistic images. There is no work using distilled images to train generative model.
Dataset Selection
a) clustering: select representative samples and remove outliers. clustering based on loss, gradient, etc.
b) data contribution: measure the contribution of each sample
- The performance difference using or without using this sample
c) learn the weights of training samples: train with weighted loss and test on the validation test
CLIP
Reference
[1] Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” arXiv preprint arXiv:2103.00020 (2021).
[2] Zhou, Kaiyang, et al. “Learning to Prompt for Vision-Language Models.” arXiv preprint arXiv:2109.01134 (2021).
[3] Wang, Mengmeng, Jiazheng Xing, and Yong Liu. “ActionCLIP: A New Paradigm for Video Action Recognition.” arXiv preprint arXiv:2109.08472 (2021).
[4] Gu, Xiuye, et al. “Zero-Shot Detection via Vision and Language Knowledge Distillation.” arXiv preprint arXiv:2104.13921 (2021).
[5] Yao, Yuan, et al. “CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models.” arXiv preprint arXiv:2109.11797 (2021).
[6] Xie, Johnathan, and Shuai Zheng. “ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation.” arXiv preprint arXiv:2109.12066 (2021).
[7] Patashnik, Or, et al. “Styleclip: Text-driven manipulation of stylegan imagery.” ICCV, 2021.
[8] Xu, Mengde, et al. “A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model.” arXiv preprint arXiv:2112.14757 (2021).
[9] Lüddecke, Timo, and Alexander Ecker. “Image Segmentation Using Text and Image Prompts.” CVPR, 2022.
Capsule Network
Reference
[1] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C]//Advances in Neural Information Processing Systems. 2017: 3856-3866.
[2] Zhang L, Edraki M, Qi G J. CapProNet: Deep feature learning via orthogonal projections onto capsule subspaces[J]. arXiv preprint arXiv:1805.07621, 2018.
[3] Jindong Gu, Volker Tresp, Han Hu, “Capsule Network is Not More Robust than Convolutional Network”, CVPR 2021.