Inject a latent vector into network:

  1. Concatenate or add it to the input image [1] [7]

  2. Concatenate or add it to encoder layer [2]

  3. Concatenate or add it to the bottleneck [3]

  4. Concatenate or add it to decoder layer [6] [8]

  5. Add to decoder layers using AdaIn [4] (without skip connection), [5] (with skip connection)

Inject a latent map into network:

  1. For concatenation or addition, spatially stack latent codes [9]

  2. Spatially adaptive AdaIn: SPADE [10], OASIS [11]

Reference

[1] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[2] Toward Multimodal Image-to-Image Translation

[3] Zheng, Chuanxia, Tat-Jen Cham, and Jianfei Cai. “Pluralistic image completion.” CVPR, 2019.

[4] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.

[5] High-Resolution Daytime Translation Without Domain Labels

[6] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).

[7] Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, “SinGAN: Learning a Generative Model from a Single Natural Image”, ICCV2019

[8] Lee, Hsin-Ying, et al. “Diverse image-to-image translation via disentangled representations.” ECCV, 2018.

[9] Yazeed Alharbi, Peter Wonka: Disentangled Image Generation Through Structured Noise Injection. CVPR, 2020.

[10] Park, Taesung, et al. “Semantic image synthesis with spatially-adaptive normalization.” CVPR, 2019.

[11] Sushko, Vadim, et al. “You only need adversarial supervision for semantic image synthesis.” ICLR, 2021

Some related papers: [1][2][3][4]

Reference

  1. Pun, Chi Seng, Kelin Xia, and Si Xian Lee. “Persistent-Homology-based Machine Learning and its Applications—A Survey.” arXiv preprint arXiv:1811.00252 (2018).

  2. Carlsson, Gunnar, and Rickard Brüel Gabrielsson. “Topological approaches to deep learning.” arXiv preprint arXiv:1811.01122 (2018).

  3. Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and interpretation of the topology of neural networks.” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019.

  4. Bergomi, Mattia G., et al. “Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning.” Nature Machine Intelligence 1.9 (2019): 423-433.

a) data selection, coreset selection, dataset pruning: select a subset of training data

  • some papers of dataset pruning for generative model:

      * Li, Yize, et al. "Pruning then reweighting: Towards data-efficient training of diffusion models." ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025.
    
      * Moser, Brian B., Federico Raue, and Andreas Dengel. "A study in dataset pruning for image super-resolution." International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2024.
    
  • dataset quantization: divide the training set into different bins and select representative samples in each bin. It could be used as a data selection strategy.

  • data attribution: survey on data attribution: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5451054. It could be used as a measurement for data selection. Some papers of data attribution for generative model:

      * Georgiev, Kristian, et al. "The journey, not the destination: How data guides diffusion models." arXiv preprint arXiv:2312.06205 (2023).
      * Zheng, Xiaosen, et al. "Intriguing properties of data attribution on diffusion models." arXiv preprint arXiv:2311.00500 (2023).
      * Lin, Jinxu, et al. "Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models." arXiv preprint arXiv:2410.18639 (2024).
    

b) dataset distillation: optimize the training set, the optimized training images are not realistic images. There is no work using distilled images to train generative model.

a) clustering: select representative samples and remove outliers. clustering based on loss, gradient, etc.

b) data contribution: measure the contribution of each sample

  • The performance difference using or without using this sample

c) learn the weights of training samples: train with weighted loss and test on the validation test

Global color mapping:

  • 3D LUT: [1], [2] non-uniform LUT
  • curve function: [1]
  • linear transformation: [1]

Local color mapping:

  • 3D LUT: [1]
  • curve function: [1], DCE[2]
  • linear transformation: HDRNet[1]

  • image classication: CLIP [1], learnable prompt [2]

  • video classification: ActionCLIP [3]

  • object detection: ViLD [4], ZSD-YOLO [6]

  • segmentation: [8] [9]

  • visual grounding: CPT [5]

  • image translation: StyleClip [7]

Reference

[1] Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” arXiv preprint arXiv:2103.00020 (2021).

[2] Zhou, Kaiyang, et al. “Learning to Prompt for Vision-Language Models.” arXiv preprint arXiv:2109.01134 (2021).

[3] Wang, Mengmeng, Jiazheng Xing, and Yong Liu. “ActionCLIP: A New Paradigm for Video Action Recognition.” arXiv preprint arXiv:2109.08472 (2021).

[4] Gu, Xiuye, et al. “Zero-Shot Detection via Vision and Language Knowledge Distillation.” arXiv preprint arXiv:2104.13921 (2021).

[5] Yao, Yuan, et al. “CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models.” arXiv preprint arXiv:2109.11797 (2021).

[6] Xie, Johnathan, and Shuai Zheng. “ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation.” arXiv preprint arXiv:2109.12066 (2021).

[7] Patashnik, Or, et al. “Styleclip: Text-driven manipulation of stylegan imagery.” ICCV, 2021.

[8] Xu, Mengde, et al. “A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model.” arXiv preprint arXiv:2112.14757 (2021).

[9] Lüddecke, Timo, and Alexander Ecker. “Image Segmentation Using Text and Image Prompts.” CVPR, 2022.

  • Typical works: CapsNet [1], CapProNet [2] [code]

  • The robustness of Capsule network: [3]

Reference

[1] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C]//Advances in Neural Information Processing Systems. 2017: 3856-3866.

[2] Zhang L, Edraki M, Qi G J. CapProNet: Deep feature learning via orthogonal projections onto capsule subspaces[J]. arXiv preprint arXiv:1805.07621, 2018.

[3] Jindong Gu, Volker Tresp, Han Hu, “Capsule Network is Not More Robust than Convolutional Network”, CVPR 2021.

0%