Newly Blog

Gaze Estimation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Approaches

Corneal reflection-based methods
- NIR or LED illumination, learning the mapping (e.g., regression, ) between glint vector and gaze direction.
Appearance based methods
- Limbus model [pdf]: fit a limbus model (a fixed-diameter disc) to detected iris edges.

Auxiliary Tools

Calibration: obtain the visual axis and kappa angle for each person.
Facial landmarks detection
- One Millisecond Face Alignment with an Ensemble of Regression Trees [pdf] [code]
- Continuous Conditional Neural Fields for Structured Regression [pdf]
Head Pose Estimation
- EPnP algorithm [pdf]

Dataset

[MPIIGaze]: fine-grained annotation
[Eyediap]: RGB-D

From Weak to Strong Supervision

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Object Detection:

image label: [WSDDN]
points that indicate the location of the object
bounding boxes

Segmentation:

image label: [SEC]
points that indicate the location of the object
scribbles that imply the extent of the object
bounding boxes
segmentation masks

Frequency Domain

Posted on 2026-03-17 Edited on 2022-08-05 In paper note

Distinguish generated fake images and real images in the freqency domain. [2]
Use frequency map as network input or output [1] [5] [6]
Use intermediate frequency features [7] [9]
An image can be composed of or decomposed into low-frequency part and high-frequency part [3] [8] [4] [10]

Reference

Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren, “Learning in the Frequency Domain”, CVPR, 2020.
Wang, Sheng-Yu, et al. “CNN-generated images are surprisingly easy to spot… for now.” arXiv preprint arXiv:1912.11035 (2019).
ayush Bansal, Yaser Sheikh, Deva Ramanan, “PixelNN: Example-based Image Synthesis”, ICLR 2018.
Yanchao Yang, Stefano Soatto, “FDA: Fourier Domain Adaptation for Semantic Segmentation”, CVPR 2020.
Roy, Hiya, et al. “Image inpainting using frequency domain priors.” arXiv preprint arXiv:2012.01832 (2020).
Shen, Xing, et al. “DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation.” arXiv preprint arXiv:2011.09876 (2020).
Suvorov, Roman, et al. “Resolution-robust Large Mask Inpainting with Fourier Convolutions.” WACV (2021).
Yu, Yingchen, et al. “WaveFill: A Wavelet-based Generation Network for Image Inpainting.” ICCV, 2021.
Mardani, Morteza, et al. “Neural ffts for universal texture image synthesis.” NeurIPS (2020).
Cai, Mu, et al. “Frequency domain image translation: More photo-realistic, better identity-preserving.” ICCV, 2021.

Forecast Future based on One Still Image

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Predict visual feature of one future frame [1]
Predict optical flow of one future frame [2]
Predict one future frame [4] (a special case of video prediction)
Predict future trajectories [5]
Predict optical flows of future frames, and then obtain future frames [3]

Reference

Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating visual representations from unlabeled video.” CVPR, 2016.
Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.
Li, Yijun, et al. “Flow-grounded spatial-temporal video prediction from still images.” ECCV, 2018.
Xue, Tianfan, et al. “Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks.” NIPS, 2016.
Walker, Jacob, et al. “An uncertain future: Forecasting from static images using variational autoencoders.” ECCV, 2016.

Few-Shot Feature Generation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Few-shot Feature Generation

Meta-learning method: [1]
Delta-based: delta between each pair of samples [2]; delta between each sample and class center [3] [4]

Reference

[1] Zhang, Ruixiang, et al. “Metagan: An adversarial approach to few-shot learning.” NIPS, 2018.

[2] Schwartz, Eli, et al. “Delta-encoder: an effective sample synthesis method for few-shot object recognition.” Advances in Neural Information Processing Systems. 2018.

[3] Liu, Jialun, et al. “Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective.” arXiv preprint arXiv:2002.10826 (2020).

[4] Yin, Xi, et al. “Feature transfer learning for face recognition with under-represented data.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

Energy Efficient Deep Learning

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Light-weighted network structure
- Xception: strictly speaking, not light-weighted CNN
- SqueezeNet
- MobileNet
- ShuffleNet
- MicroNet
SqueezeNet, MobileNet, and ShuffleNet share the same idea: decouple the temporal convolution and spatial convolution to reduce the nummber of parameters, sharing the similar spirit with Pseudo-3D Residual Networks. SqueezeNet is serial while MobileNet and ShuffleNet are parrallel. MobileNet is a special case of ShuffleNet when using only one group.

Low-rank approximation ($k\times k \times c\times d = k\times k\times c\times d’ + 1\times 1\times d’\times d$) also falls into the above scope. The difference between MobileNet and Low-rank approximation is layerwise convolution or not.
Tweak network structure
- prune nodes based on certain criteria (e.g., response value, Fisher information): require special implementation and take up more space than expected due to irregular network structure.
Compress weights
- Quantization (fixed bit number): learn codebook and encode weights. Fine-tune codebook after quantizatizing weights, which averages the gradient of weights belonging to the same cluster. Extreme cases are binary net and ternary net. Binary (resp, ternary) net are quantized to [-1, 1] (resp, [-1, 0, 1]), with different weights $\alpha$ for different layers.
- Huffman Coding (flexible bit number): applied after quantization for further compression.
Computation
- spatial domain to frequency domain: convert convolution to pointwise multiplication by using FFT
Sparsity regularization
- L0 norm
Efficient Inference
- cascade of networks, early exit network (predict whether to exit or not after each layer) [1] [2]

Good introduction slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf

Edge Detection

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Multi-scale fusion: HED [1], RCF [2]

Reference

Xie, Saining, and Zhuowen Tu. “Holistically-nested edge detection.” ICCV, 2015.
Liu, Yun, et al. “Richer convolutional features for edge detection.” CVPR, 2017.

Dynamic Kernel

Posted on 2026-03-17 Edited on 2022-09-19 In paper note

Dynamic kernels: [1] [2]

Survey: [Dynamic neural networks: A survey]

References

Jia, Xu, et al. “Dynamic filter networks.” Advances in neural information processing systems 29 (2016).
Tian, Zhi, Chunhua Shen, and Hao Chen. “Conditional convolutions for instance segmentation.” European conference on computer vision. Springer, Cham, 2020.

Domain Generalization

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

a) When the domain labels are known:

reduce the distance between different domains: MMD [1][2], mutual information
domain-invariant and domain-specific components: [1][2]

b) When the domain labels are unkown:

first discover multiple latent domains: cluster [1][2], max margin separation [1]

Domain Adaptation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Methods

learn projection matrix: F(PXs, QXt)
- project to common subspace
  - TCA [pdf]
  - SA [pdf]
  - LSSA: extension of SA [pdf]
  - DIP [pdf]
  - CORAL [pdf]
  - deep CORAL [pdf]
  - other deep feature-based methods [1] [2]
- interpolation on the manifold
  - SGF [pdf]
  - GFK [pdf]
sample selection: learn sample weights
- KMM [pdf]
- STM [pdf]
- DASVM [pdf]
- weighted adversarial network [1][2]
domain-invariant and domain-specific components
- SDDL [pdf]
- Domain Separation Network [pdf]
- low-rank DL [pdf]
low-rank reconstruction
- LTSL [pdf]
- RDALR [pdf]
pixel-level image to image translation
- paired input: conditional GAN [pdf]
- unpaired input: cycling GAN [pdf], GAN with content-similarity loss [pdf], UNIT [pdf]
- combine with feature-based method: GraspGAN [pdf]
- A unified framework [pdf]
adversarial network [1]: classification and domain confusion. The domain separation and confusion problem, which is a min-max problem, can be solved like GAN or using reverse gradient (RevGrad) algorithm.
meta-learning
- gradients on two domains should be consistent [pdf]
domain alignment layer (batch normalization): [1] [2]
guided learning: tutor guides students and get feedback from students. ACM-MM18 paper
ensemble transfer learning: aggregate multiple transfer learning approaches [1]

Settings

open-set domain adaptation or partial transfer learning: [1][2][3]
distant domain adaptation (two domains are too distant, so the transfer between them relies on transition domains): Transitive transfer learning, distant domain transfer learning
open compound domain adaptation [1]

Domain adaptation for diverse applications

pose estimation [1]
person re-identification [1]
objection detection [1]
segmentation [1]
VQA [1]

Domain difference metric: To measure data distribution mismatch, the most commonly used metric is MMD and its extensions such as fast MMD, conditional MMD [1][2] and joint MMD. There are also some other metrics like KL divergence, HSIC criterion, Bregman divergence, manifold criterion, and second-order statistic.

Theories: A summary of related theories

Survey:

An old survey of transfer learning [pdf]
Recent advance on domain adaptation [pdf]
My survey of old deep learning domain adaptation methods [pdf]
A Chinese version of transfer learning tutorial [pdf]
Datasets and code: [1]
A Comprehensive Survey on Transfer Learning [pdf]