One-shot/few-shot learning

The first one-shot learning paper dates back to 2006, but becomes more popular recently.

Concepts

training/validation/test categories: Training categories and test categories have no overlap

support(sample)/query(batch) set: In the testing stage, for each test category, we preserve some instances to form the support set and sample from the remaining instances to form the query set

C-way K-shot: The test set has C categories. For each test category, we preserve K instances as the support set

episode: Episode-based strategy used in the training stage to match the inference in the testing stage. First sample some categories and then sample the suppport/query set for each category

Methods

  • Metric based:

    • Siamese network: the earliest and simplest metric-learning based few-shot learning, standard verification problem.

    • Matching network: map a support set to a classification function p(y|x,S) (KNN or LSTM). For the LSTM version, there is another similar work using memory module.

    • Relation network: calculate the relation score for 1-shot, calculate the average of relation scores for k-shot

    • Prototypical network: compare with the prototype representations of each class. Each class can have more than one prototype representation. There are some other prototype-based methods [1] [2].

  • Optimization (gradient) based:

  • Model based:

    • [learnet] [2] [3] [4] [5]: predict the parameters of classifiers for novel categories.

    • [1]: predict the parameters of CNN feature extractor by virtue of memory module.

  • Generation based: generate more features for novel categories [1], [2]

  • Pretraind and fine-tune: use the whole meta-training set to learn feature extractor [1] [2] pretrain+MatchingNet [3]

Survey

  1. Generalizing from a Few Examples: A Survey on Few-Shot Learning

  2. Learning from Few Samples: A Survey

Datsets

  1. Meta-Dataset

Framework:

The similarity between two faces Ia and Ib can be unified in the following formulation:

M[W(F(S(Ia))), W(F(S(Ib)))]

in which S is synthesis operation (e.g., face alignment, frontalization), F is robust feature extraction, W is transformation subspace learning, M means face matching algorithm (e.g., NN, SVM, metric learning).

Paper:

Survey:

Dataset:

LFW: http://vis-www.cs.umass.edu/lfw/

IJB-A: (free upon request) https://www.nist.gov/itl/iad/image-group/ijba-dataset-request-form

FERET: (free upon request) https://www.nist.gov/itl/iad/image-group/color-feret-database

CMU Multi-Pie: (not free) http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html

CASIA WebFace Database: (free upon request) http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

MS-Celeb-1M: https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/

MegaFace: (free upon request) http://megaface.cs.washington.edu/dataset/download_training.html

Cross-Age Celebrity Dataset: http://bcsiriuschen.github.io/CARC/

VGG face: http://www.robots.ox.ac.uk/~vgg/data/vgg_face/

Domain adaptation:

  • Align source feature map and target feature map: reduce H-divergence of regional feature map [4][8], cycle consistency [11]

  • Translation source domain image to target domain image: [5][6][10] (combine with target domain pseudo labels).

  • Training with pseudo labels on the target domain: Curriculumn learning [1] (global image label distribution and landmark superpixel label distribution); self-training [7]; use multiple models to vote for pseudo labels [9]

Domain adaptation with privileged information:

  • Domain adaptation with privileged information like depth: SPIGAN [2] (enforce synthetic image and generated image to predict the same depth), [3] (adversarial learning on depth)

Reference

[1] Zhang, Yang, Philip David, and Boqing Gong. “Curriculum domain adaptation for semantic segmentation of urban scenes.” ICCV, 2017.

[2] Lee, Kuan-Hui, et al. “SPIGAN: Privileged Adversarial Learning from Simulation.” ICLR, 2019.

[3] Vu, Tuan-Hung, et al. “DADA: Depth-aware Domain Adaptation in Semantic Segmentation.” arXiv preprint arXiv:1904.01886 (2019).

[4] Chen, Yuhua, Wen Li, and Luc Van Gool. “Road: Reality oriented adaptation for semantic segmentation of urban scenes.” CVPR, 2018.

[5] Hoffman, Judy, et al. “Cycada: Cycle-consistent adversarial domain adaptation.” arXiv preprint arXiv:1711.03213 (2017).

[6] Sankaranarayanan, Swami, et al. “Learning from synthetic data: Addressing domain shift for semantic segmentation.” CVPR, 2018.

[7] Zou, Yang, et al. “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training.”, ECCV, 2018.

[8] Hong, Weixiang, et al. “Conditional generative adversarial network for structured domain adaptation.” CVPR, 2018.

[9] Zhang, Junting, Chen Liang, and C-C. Jay Kuo. “A fully convolutional tri-branch network (FCTN) for domain adaptation.” ICASSP, 2018.

[10] Li, Yunsheng, Lu Yuan, and Nuno Vasconcelos. “Bidirectional Learning for Domain Adaptation of Semantic Segmentation.” arXiv preprint arXiv:1904.10620 (2019).

[11] Kang, Guoliang, et al. “Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation.” Advances in Neural Information Processing Systems 33 (2020).

  • Camouflaged Object Detection: [1]

  • Camouflaged Object Segmentation [2]

Reference

  1. Fan, Deng-Ping, et al. “Camouflaged object detection.” CVPR, 2020.

  2. Yan, Jinnan, et al. “MirrorNet: Bio-Inspired Adversarial Attack for Camouflaged Object Segmentation.” arXiv preprint arXiv:2007.12881 (2020).

  1. propagate information within each non-boundary region [1]

  2. focus on unconfident boundary regions [2]

  3. fuse boundary feature and image feature [3]

Reference

[1] Ding, Henghui, et al. “Boundary-aware feature propagation for scene segmentation.” ICCV, 2019.

[2] Marin, Dmitrii, et al. “Efficient segmentation: Learning downsampling near semantic boundaries.” ICCV, 2019.

[3] Takikawa, Towaki, et al. “Gated-scnn: Gated shape cnns for semantic segmentation.” ICCV, 2019.

0%