1. Traditional data augmentation

    • color, hue, illumination

    • flip, crop, shear, rotation, (piecewise) affine transformation, Cutout, RandErasing, HideAndSeek, GridMask

  2. Mixtures: Mixup [1], CutMix [2] (Mixture in spatial domain), GridMask [6], FMix [3] (Mixture in frequency)

  3. Learn optimal data augmentation strategy: [4] [5], AutoAugment, RandAugment, Fast AutoAugment, Faster AutoAugment, Greedy Augment.

  4. Semantic augmentation: [7]

A summary of existing data augmentation methods [link]

Reference

[1] mixup: Beyond empirical risk minimization

[2] Cutmix: Regularization strategy to train strong classifiers with localizable features

[3] Understanding and Enhancing Mixed Sample Data Augmentation

[4] AutoAugment: Learning Augmentation Strategies from Data

[5] The Effectiveness of Data Augmentation in Image Classification using Deep Learning

[6] GridMask Data Augmentation

[7] Regularizing Deep Networks with Semantic Data Augmentation

  1. deepfashion: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html (attribute, bounding box, landmark)

  2. Colorful-Fashion: https://sites.google.com/site/fashionparsing/home (pixel-level color-category label)

  3. CCP (Clothing Co-Parsing): https://github.com/bearpaw/clothing-co-parsing (parsing label)

  4. fashionistas: http://vision.is.tohoku.ac.jp/~kyamagu/research/clothing_parsing/(parsing label)

  5. HPW (Human Parsing in the Wild): https://github.com/lemondan/HumanParsing-Dataset (parsing label)

  6. modaNet: https://github.com/eBay/modanet (polygon annotations)

  1. re-sampling
  2. synthetic samples: generate more samples for minor classes
  3. re-weighting
  4. few-shot learning
  5. decoupling representation and classifier learning: use normal sampling in the feature learning stage and use re-sampling in the classifier learning stage.

Big Names: Judy Pearl [Tutorial] [slides] [textbook], James Robin [Textbook] [slides]

Tutorial:

Workshop: NIPS2018 workshop on causal learning, KDD2020 Tutorial on Causal Inference Meets Machine Learning

Material: MILA Course

Causality and disentanglement: [5] [6]

Counterfactual and disentanglement: [7]

Reference

[1] Chalupka K, Perona P, Eberhardt F. Visual causal feature learning. arXiv preprint arXiv:1412.2309, 2014.

[2] Lopez-Paz D, Nishihara R, Chintala S, et al. Discovering causal signals in images. CVPR, 2017.

[3] Bau D, Zhu J Y, Strobelt H, et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arXiv preprint arXiv:1811.10597, 2018.

[4] Bernhard Schölkopf: CAUSALITY FOR MACHINE LEARNING. arXiv preprint arXiv:1911.10500, 2019.

[5] Kim, Hyemi, et al. “Counterfactual Fairness with Disentangled Causal Effect Variational Autoencoder.” arXiv preprint arXiv:2011.11878 (2020).

[6] Shen, Xinwei, et al. “Disentangled Generative Causal Representation Learning.” arXiv preprint arXiv:2010.02637 (2020).

[7] Yue, Zhongqi, et al. “Counterfactual Zero-Shot and Open-Set Visual Recognition.” arXiv preprint arXiv:2103.00887 (2021).

[8] Schölkopf, Bernhard, et al. “Towards causal representation learning.” arXiv preprint arXiv:2102.11107 (2021).

  • watermark removal: ICA [4], inpainting [5]

  • watermarks consistent across a collection of images: multi-image matting and reconstruction [3]

  • Survey papers on watermarking: [1] [2]

Reference

  1. Podilchuk, Christine I., and Edward J. Delp. “Digital watermarking: algorithms and applications.” IEEE signal processing Magazine 18.4 (2001): 33-46.
  2. Potdar, Vidyasagar M., Song Han, and Elizabeth Chang. “A survey of digital image watermarking techniques.” INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005.. IEEE, 2005.
  3. Dekel, Tali, et al. “On the effectiveness of visible watermarks.” CVPR, 2017.

First deep learning approach for video harmonization [1]

[1] Haozhi Huang, Senzhe Xu, Junxiong Cai, Wei Liu, Shimin Hu, “Temporally Coherent Video Harmonization Using
Adversarial Networks”, arxiv, 2018.

Advanced VAE

  1. VQVAE [1],VQVAE2 [2]. Accelerate auto-regression: [4] [5]

  2. NVAE [3]

References

[1] Oord, Aaron van den, Oriol Vinyals, and Koray Kavukcuoglu. “Neural discrete representation learning.” arXiv preprint arXiv:1711.00937 (2017).
[2] Razavi, Ali, Aaron van den Oord, and Oriol Vinyals. “Generating diverse high-fidelity images with vq-vae-2.” Advances in neural information processing systems. 2019.
[3] Vahdat, Arash, and Jan Kautz. “Nvae: A deep hierarchical variational autoencoder.” arXiv preprint arXiv:2007.03898 (2020).
[4] Bond-Taylor, Sam, et al. “Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes.” arXiv preprint arXiv:2111.12701 (2021).
[5] Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman, “MaskGIT: Masked Generative Image Transformer”, arXiv preprint arXiv:2202.04200.

  • StyleGAN of all trades [1]
  • StyleGANv1[5]
  • StyleGANv2[6]: remove blob-shaped artifacts that resemble water droplets.
  • StyleGANv3[2]: solve alias (texture sticking) issue, that is, detail appearing to glued to image coordinates instead of the surface of depicted objects.
  • StyleGAN-XL [3]: extend to large dataset
  • 3D styleGAN [4]

Image editing using styleGA

InsetGAN [7]

Reference

[1] Chong, Min Jin, Hsin-Ying Lee, and David Forsyth. “StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN.” arXiv preprint arXiv:2111.01619 (2021).

[2] Karras, Tero, et al. “Alias-free generative adversarial networks.” Thirty-Fifth Conference on Neural Information Processing Systems. 2021.

[3] Sauer, Axel, Katja Schwarz, and Andreas Geiger. “Stylegan-xl: Scaling stylegan to large diverse datasets.” arXiv preprint arXiv:2202.00273 (2022).

[4] Xiaoming Zhao, Fangchang Ma, David Güera, Zhile Ren, Alexander G. Schwing, Alex Colburn. “Generative Multiplane Images: Making a 2D GAN 3D-Aware”.

[5] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

[6] Karras, Tero, et al. “Analyzing and improving the image quality of stylegan.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[7] Frühstück, Anna, et al. “Insetgan for full-body image generation.” CVPR, 2022.

  • SAM [1]

  • FastSAM [2]: first generate proposals and then select target proposals

  • High-quality SAM [3]

  • Semantic-SAM [4]: assign semantic labels

Reference

[1] Kirillov, Alexander, et al. “Segment anything.” arXiv preprint arXiv:2304.02643 (2023).

[2] Zhao, Xu, et al. “Fast Segment Anything.” arXiv preprint arXiv:2306.12156 (2023).

[3] Ke, Lei, et al. “Segment Anything in High Quality.” arXiv preprint arXiv:2306.01567 (2023).

[4] Li, Feng, et al. “Semantic-SAM: Segment and Recognize Anything at Any Granularity.” arXiv preprint arXiv:2307.04767 (2023).

Translate one or multiple instances in an image: [1]

Reference

[1] Mo, Sangwoo, Minsu Cho, and Jinwoo Shin. “Instagan: Instance-aware image-to-image translation.” arXiv preprint arXiv:1812.10889 (2018).

0%