Forecast Future based on One Still Image

Posted on 2022-06-16 | In paper note

Predict visual feature of one future frame [1]
Predict optical flow of one future frame [2]
Predict one future frame [4] (a special case of video prediction)
Predict future trajectories [5]
Predict optical flows of future frames, and then obtain future frames [3]

Reference

Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating visual representations from unlabeled video.” CVPR, 2016.
Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.
Li, Yijun, et al. “Flow-grounded spatial-temporal video prediction from still images.” ECCV, 2018.
Xue, Tianfan, et al. “Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks.” NIPS, 2016.
Walker, Jacob, et al. “An uncertain future: Forecasting from static images using variational autoencoders.” ECCV, 2016.

Fine-grained Dataset

Posted on 2022-06-16 | In paper note

Surveys:

Deep learning for fine-grained image analysis: A survey [1]: include few-shot classification, few-shot retrieval, and few-shot generation
A survey on deep learningbased fine-grained object classification and semantic segmentation [2]

[1] Xiu-Shen Wei, Jianxin Wu, Quan Cui. “Deep learning for fine-grained image analysis: A survey.” arXiv preprint arXiv:1907.03069 (2019).

[2] Zhao, Bo, et al. “A survey on deep learning-based fine-grained object classification and semantic segmentation.” International Journal of Automation and Computing 14.2 (2017): 119-135.

Datasets:

Few-Shot Object Detection

Posted on 2022-06-16 | In paper note

Feature generation for novel categories: [1] [3] [4]
Model emsembling: [2]

Reference

[1] Zhang, Weilin, and Yu-Xiong Wang. “Hallucination Improves Few-Shot Object Detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[2] Zhang, Weilin, Yu-Xiong Wang, and David A. Forsyth. “Cooperating RPN’s Improve Few-Shot Object Detection.” arXiv preprint arXiv:2011.10142 (2020).

[3] Xu, Honghui, et al. “Few-Shot Object Detection via Sample Processing.” IEEE Access 9 (2021): 29207-29221.

[4] Wu, Aming, et al. “Universal-prototype enhancing for few-shot object detection.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Few-Shot Image Generation From Large Dataset to Small Dataset

Posted on 2022-06-16 | In paper note

Transfer a large proportion of parameters and only update a few parameters [1] updates scaling and shifting parameters. [2] updates the miner before generator. [3] uses Fisher information to select the parameters to be updates. Empirically, the last layers are prone to be frozen. Similarly, in [6], the last layers are frozen and scaling/shifting parameters are predicted.
Transfer structure similarity from large dataset to small dataset: [4]
Transfer parameter basis: [5] adapts the singular values of the pre-trained weights while freezing the corresponding singular vectors.

Reference

Atsuhiro Noguchi, Tatsuya Harada: “Image generation from small datasets via batch statistics adaptation.” ICCV (2019)
Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, Joost van de Weijer: “MineGAN: effective knowledge transfer from GANs to target domains with few images.” CVPR (2020).
Yijun Li, Richard Zhang, Jingwan Lu, Eli Shechtman: “Few-shot Image Generation with Elastic Weight Consolidation.” NeurIPS (2020).
Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang: “Few-shot Image Generation via Cross-domain Correspondence.” CVPR (2021).
Esther Robb, Wen-Sheng Chu, Abhishek Kumar, Jia-Bin Huang: “Few-Shot Adaptation of Generative Adversarial Networks.” arXiv (2020).
Miaoyun Zhao, Yulai Cong, Lawrence Carin: “On Leveraging Pretrained GANs for Generation with Limited Data.” ICML (2020).

Few-Shot Image Generation From Base Categories to Novel Categories

Posted on 2022-06-16 | In paper note

Fusion-based method: Generative Matching Network (GMN) [1] (VAE with matching network for generator and recognizer). MatchingGAN [3] learns reasonable interpolation coefficients. F2GAN [5] first fuses high-level features and then fills in low-level details.
Optimization-based method: FIGR [2] is based on Reptile. DAWSON [4] is based on MAML.
Transformation-based method: DAGAN [6] samples random vectors to generate new images. DeltaGAN [7] learns sample-specific delta.

Reference

[1] Bartunov, Sergey, and Dmitry Vetrov. “Few-shot generative modelling with generative matching networks.” , 2018.

[2] Clouâtre, Louis, and Marc Demers. “FIGR: Few-shot ImASTATISage Generation with Reptile.” arXiv preprint arXiv:1901.02199 (2019).

[3] Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang, “MatchingGAN: Matching-based Few-shot Image Generation”, ICME, 2020

[4] Weixin Liang, Zixuan Liu, Can Liu: “DAWSON: A Domain Adaptive Few Shot Generation Framework.” CoRR abs/2001.00576 (2020)

[5] Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, Liqing Zhang: “F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation.” ACM MM (2020)

[6] Antreas Antoniou, Amos J. Storkey, Harrison Edwards: “Data Augmentation Generative Adversarial Networks.” stat (2018)

[7] Yan Hong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang: “DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta.” CoRR abs/2009.08753 (2020)

Few-Shot Feature Generation

Posted on 2022-06-16 | In paper note

Few-shot Feature Generation

Meta-learning method: [1]
Delta-based: delta between each pair of samples [2]; delta between each sample and class center [3] [4]

Reference

[1] Zhang, Ruixiang, et al. “Metagan: An adversarial approach to few-shot learning.” NIPS, 2018.

[2] Schwartz, Eli, et al. “Delta-encoder: an effective sample synthesis method for few-shot object recognition.” Advances in Neural Information Processing Systems. 2018.

[3] Liu, Jialun, et al. “Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective.” arXiv preprint arXiv:2002.10826 (2020).

[4] Yin, Xi, et al. “Feature transfer learning for face recognition with under-represented data.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

Few-Shot Classification

Posted on 2022-06-16 | In paper note

One-shot/few-shot learning

The first one-shot learning paper dates back to 2006, but becomes more popular recently.

Concepts

training/validation/test categories: Training categories and test categories have no overlap

support(sample)/query(batch) set: In the testing stage, for each test category, we preserve some instances to form the support set and sample from the remaining instances to form the query set

C-way K-shot: The test set has C categories. For each test category, we preserve K instances as the support set

episode: Episode-based strategy used in the training stage to match the inference in the testing stage. First sample some categories and then sample the suppport/query set for each category

Methods

Metric based:
- Siamese network: the earliest and simplest metric-learning based few-shot learning, standard verification problem.
- Matching network: map a support set to a classification function p(y|x,S) (KNN or LSTM). For the LSTM version, there is another similar work using memory module.
- Relation network: calculate the relation score for 1-shot, calculate the average of relation scores for k-shot
- Prototypical network: compare with the prototype representations of each class. Each class can have more than one prototype representation. There are some other prototype-based methods [1] [2].

Optimization (gradient) based:
- [MAML] (https://arxiv.org/pdf/1703.03400.pdf)
- REPTILE) (an approximation of MAML)
- Meta-Learner LSTM
Model based:
- [learnet] [2] [3] [4] [5]: predict the parameters of classifiers for novel categories.
- [1]: predict the parameters of CNN feature extractor by virtue of memory module.
Generation based: generate more features for novel categories [1], [2]
Pretraind and fine-tune: use the whole meta-training set to learn feature extractor [1] [2] pretrain+MatchingNet [3]

Survey

Datsets

Meta-Dataset

Face Verification and Recognition

Posted on 2022-06-16 | In paper note

Framework:

The similarity between two faces Ia and Ib can be unified in the following formulation:

M[W(F(S(Ia))), W(F(S(Ib)))]

in which S is synthesis operation (e.g., face alignment, frontalization), F is robust feature extraction, W is transformation subspace learning, M means face matching algorithm (e.g., NN, SVM, metric learning).

Paper:

DeepID 1,2,3: Deep learning face representation from predicting 10,000 classes
FaceNet: A Unified Embedding for Face Recognition and Clustering
code: https://cmusatyalab.github.io/openface/ (triplet loss)
DeepFace: Closing the Gap to Human-Level Performance in Face Verification (3D face alignment)
A Discriminative Feature Learning Approach for Deep Face Recognition
code: https://github.com/ydwen/caffe-face
Unconstrained Face Verification using Deep CNN Features (Joint Bayesian Metric Learning)
code: https://github.com/happynear/FaceVerification
A Light CNN for Deep Face Representation with Noisy Label
code: https://github.com/AlfredXiangWu/face_verification_experiment

Survey:

Face Recognition: From Traditional to Deep Learning Methods

Dataset:

LFW: http://vis-www.cs.umass.edu/lfw/

IJB-A: (free upon request) https://www.nist.gov/itl/iad/image-group/ijba-dataset-request-form

FERET: (free upon request) https://www.nist.gov/itl/iad/image-group/color-feret-database

CMU Multi-Pie: (not free) http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html

CASIA WebFace Database: (free upon request) http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

MS-Celeb-1M: https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/

MegaFace: (free upon request) http://megaface.cs.washington.edu/dataset/download_training.html

Cross-Age Celebrity Dataset: http://bcsiriuschen.github.io/CARC/

VGG face: http://www.robots.ox.ac.uk/~vgg/data/vgg_face/

Energy Efficient Deep Learning

Posted on 2022-06-16 | In paper note

Light-weighted network structure
- Xception: strictly speaking, not light-weighted CNN
- SqueezeNet
- MobileNet
- ShuffleNet
- MicroNet
SqueezeNet, MobileNet, and ShuffleNet share the same idea: decouple the temporal convolution and spatial convolution to reduce the nummber of parameters, sharing the similar spirit with Pseudo-3D Residual Networks. SqueezeNet is serial while MobileNet and ShuffleNet are parrallel. MobileNet is a special case of ShuffleNet when using only one group.

Low-rank approximation ($k\times k \times c\times d = k\times k\times c\times d’ + 1\times 1\times d’\times d$) also falls into the above scope. The difference between MobileNet and Low-rank approximation is layerwise convolution or not.
Tweak network structure
- prune nodes based on certain criteria (e.g., response value, Fisher information): require special implementation and take up more space than expected due to irregular network structure.
Compress weights
- Quantization (fixed bit number): learn codebook and encode weights. Fine-tune codebook after quantizatizing weights, which averages the gradient of weights belonging to the same cluster. Extreme cases are binary net and ternary net. Binary (resp, ternary) net are quantized to [-1, 1] (resp, [-1, 0, 1]), with different weights $\alpha$ for different layers.
- Huffman Coding (flexible bit number): applied after quantization for further compression.
Computation
- spatial domain to frequency domain: convert convolution to pointwise multiplication by using FFT
Sparsity regularization
- L0 norm
Efficient Inference
- cascade of networks, early exit network (predict whether to exit or not after each layer) [1] [2]

Good introduction slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf

Edge Detection

Posted on 2022-06-16 | In paper note

Multi-scale fusion: HED [1], RCF [2]

Reference

Xie, Saining, and Zhuowen Tu. “Holistically-nested edge detection.” ICCV, 2015.
Liu, Yun, et al. “Richer convolutional features for edge detection.” CVPR, 2017.