Newly Blog

Distillation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

knowledge/model distillation [1]
data distillation [2] [4] [5]

A survey of knowledge distillation [3]

Reference

[1] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015).

[2] Radosavovic, Ilija, et al. “Data distillation: Towards omni-supervised learning.” CVPR, 2018.

[3] Wang, Lin, and Kuk-Jin Yoon. “Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks.” arXiv preprint arXiv:2004.05937 (2020).

[4] Nguyen, Timothy, Zhourong Chen, and Jaehoon Lee. “Dataset Meta-Learning from Kernel Ridge-Regression.” arXiv preprint arXiv:2011.00050 (2020).

[5] Nguyen, Timothy, et al. “Dataset distillation with infinitely wide convolutional networks.” Advances in Neural Information Processing Systems 34 (2021).

Data Augmentation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Traditional data augmentation
- color, hue, illumination
- flip, crop, shear, rotation, (piecewise) affine transformation, Cutout, RandErasing, HideAndSeek, GridMask
Mixtures: Mixup [1], CutMix [2] (Mixture in spatial domain), GridMask [6], FMix [3] (Mixture in frequency)
Learn optimal data augmentation strategy: [4] [5], AutoAugment, RandAugment, Fast AutoAugment, Faster AutoAugment, Greedy Augment.
Semantic augmentation: [7]

A summary of existing data augmentation methods [link]

Reference

[1] mixup: Beyond empirical risk minimization

[2] Cutmix: Regularization strategy to train strong classifiers with localizable features

[3] Understanding and Enhancing Mixed Sample Data Augmentation

[4] AutoAugment: Learning Augmentation Strategies from Data

[5] The Effectiveness of Data Augmentation in Image Classification using Deep Learning

[6] GridMask Data Augmentation

[7] Regularizing Deep Networks with Semantic Data Augmentation

Clothes Dataset

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

deepfashion: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html (attribute, bounding box, landmark)
Colorful-Fashion: https://sites.google.com/site/fashionparsing/home (pixel-level color-category label)
CCP (Clothing Co-Parsing): https://github.com/bearpaw/clothing-co-parsing (parsing label)
fashionistas: http://vision.is.tohoku.ac.jp/~kyamagu/research/clothing_parsing/(parsing label)
HPW (Human Parsing in the Wild): https://github.com/lemondan/HumanParsing-Dataset (parsing label)
modaNet: https://github.com/eBay/modanet (polygon annotations)

Class Imbalance

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

re-sampling
synthetic samples: generate more samples for minor classes
re-weighting
few-shot learning
decoupling representation and classifier learning: use normal sampling in the feature learning stage and use re-sampling in the classifier learning stage.

Causal Inference

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Big Names: Judy Pearl [Tutorial] [slides] [textbook], James Robin [Textbook] [slides]

Tutorial:

Causality for machine learning [4]
Towards Causal Representation Learning [8]
A briefing on causal inference written by myself

Workshop: NIPS2018 workshop on causal learning, KDD2020 Tutorial on Causal Inference Meets Machine Learning

Material: MILA Course

Causality and disentanglement: [5] [6]

Counterfactual and disentanglement: [7]

Reference

[1] Chalupka K, Perona P, Eberhardt F. Visual causal feature learning. arXiv preprint arXiv:1412.2309, 2014.

[2] Lopez-Paz D, Nishihara R, Chintala S, et al. Discovering causal signals in images. CVPR, 2017.

[3] Bau D, Zhu J Y, Strobelt H, et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arXiv preprint arXiv:1811.10597, 2018.

[4] Bernhard Schölkopf: CAUSALITY FOR MACHINE LEARNING. arXiv preprint arXiv:1911.10500, 2019.

[5] Kim, Hyemi, et al. “Counterfactual Fairness with Disentangled Causal Effect Variational Autoencoder.” arXiv preprint arXiv:2011.11878 (2020).

[6] Shen, Xinwei, et al. “Disentangled Generative Causal Representation Learning.” arXiv preprint arXiv:2010.02637 (2020).

[7] Yue, Zhongqi, et al. “Counterfactual Zero-Shot and Open-Set Visual Recognition.” arXiv preprint arXiv:2103.00887 (2021).

[8] Schölkopf, Bernhard, et al. “Towards causal representation learning.” arXiv preprint arXiv:2102.11107 (2021).

Watermark Removal

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

watermark removal: ICA [4], inpainting [5]
watermarks consistent across a collection of images: multi-image matting and reconstruction [3]

Survey papers on watermarking: [1] [2]

Reference

Podilchuk, Christine I., and Edward J. Delp. “Digital watermarking: algorithms and applications.” IEEE signal processing Magazine 18.4 (2001): 33-46.
Potdar, Vidyasagar M., Song Han, and Elizabeth Chang. “A survey of digital image watermarking techniques.” INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005.. IEEE, 2005.
Dekel, Tali, et al. “On the effectiveness of visible watermarks.” CVPR, 2017.

Video Harmonization

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

First deep learning approach for video harmonization [1]

[1] Haozhi Huang, Senzhe Xu, Junxiong Cai, Wei Liu, Shimin Hu, “Temporally Coherent Video Harmonization Using
Adversarial Networks”, arxiv, 2018.

VAE

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Advanced VAE

VQVAE [1],VQVAE2 [2]. Accelerate auto-regression: [4] [5]
NVAE [3]

References

[1] Oord, Aaron van den, Oriol Vinyals, and Koray Kavukcuoglu. “Neural discrete representation learning.” arXiv preprint arXiv:1711.00937 (2017).
[2] Razavi, Ali, Aaron van den Oord, and Oriol Vinyals. “Generating diverse high-fidelity images with vq-vae-2.” Advances in neural information processing systems. 2019.
[3] Vahdat, Arash, and Jan Kautz. “Nvae: A deep hierarchical variational autoencoder.” arXiv preprint arXiv:2007.03898 (2020).
[4] Bond-Taylor, Sam, et al. “Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes.” arXiv preprint arXiv:2111.12701 (2021).
[5] Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman, “MaskGIT: Masked Generative Image Transformer”, arXiv preprint arXiv:2202.04200.

StyleGAN

Posted on 2026-03-17 Edited on 2023-05-31 In paper note

StyleGAN of all trades [1]
StyleGANv1[5]
StyleGANv2[6]: remove blob-shaped artifacts that resemble water droplets.
StyleGANv3[2]: solve alias (texture sticking) issue, that is, detail appearing to glued to image coordinates instead of the surface of depicted objects.
StyleGAN-XL [3]: extend to large dataset
3D styleGAN [4]

Image editing using styleGA

InsetGAN [7]

Reference

[1] Chong, Min Jin, Hsin-Ying Lee, and David Forsyth. “StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN.” arXiv preprint arXiv:2111.01619 (2021).

[2] Karras, Tero, et al. “Alias-free generative adversarial networks.” Thirty-Fifth Conference on Neural Information Processing Systems. 2021.

[3] Sauer, Axel, Katja Schwarz, and Andreas Geiger. “Stylegan-xl: Scaling stylegan to large diverse datasets.” arXiv preprint arXiv:2202.00273 (2022).

[4] Xiaoming Zhao, Fangchang Ma, David Güera, Zhile Ren, Alexander G. Schwing, Alex Colburn. “Generative Multiplane Images: Making a 2D GAN 3D-Aware”.

[5] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

[6] Karras, Tero, et al. “Analyzing and improving the image quality of stylegan.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[7] Frühstück, Anna, et al. “Insetgan for full-body image generation.” CVPR, 2022.

Segment Anything

Posted on 2026-03-17 Edited on 2023-07-14 In paper note

SAM [1]
FastSAM [2]: first generate proposals and then select target proposals
High-quality SAM [3]
Semantic-SAM [4]: assign semantic labels

Reference

[1] Kirillov, Alexander, et al. “Segment anything.” arXiv preprint arXiv:2304.02643 (2023).

[2] Zhao, Xu, et al. “Fast Segment Anything.” arXiv preprint arXiv:2306.12156 (2023).

[3] Ke, Lei, et al. “Segment Anything in High Quality.” arXiv preprint arXiv:2306.01567 (2023).

[4] Li, Feng, et al. “Semantic-SAM: Segment and Recognize Anything at Any Granularity.” arXiv preprint arXiv:2307.04767 (2023).