Newly Blog

Fine-grained Dataset

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Surveys:

Deep learning for fine-grained image analysis: A survey [1]: include few-shot classification, few-shot retrieval, and few-shot generation
A survey on deep learningbased fine-grained object classification and semantic segmentation [2]

[1] Xiu-Shen Wei, Jianxin Wu, Quan Cui. “Deep learning for fine-grained image analysis: A survey.” arXiv preprint arXiv:1907.03069 (2019).

[2] Zhao, Bo, et al. “A survey on deep learning-based fine-grained object classification and semantic segmentation.” International Journal of Automation and Computing 14.2 (2017): 119-135.

Datasets:

Few-Shot Object Detection

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Feature generation for novel categories: [1] [3] [4]
Model emsembling: [2]

Reference

[1] Zhang, Weilin, and Yu-Xiong Wang. “Hallucination Improves Few-Shot Object Detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[2] Zhang, Weilin, Yu-Xiong Wang, and David A. Forsyth. “Cooperating RPN’s Improve Few-Shot Object Detection.” arXiv preprint arXiv:2011.10142 (2020).

[3] Xu, Honghui, et al. “Few-Shot Object Detection via Sample Processing.” IEEE Access 9 (2021): 29207-29221.

[4] Wu, Aming, et al. “Universal-prototype enhancing for few-shot object detection.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Few-Shot Classification

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

One-shot/few-shot learning

The first one-shot learning paper dates back to 2006, but becomes more popular recently.

Concepts

training/validation/test categories: Training categories and test categories have no overlap

support(sample)/query(batch) set: In the testing stage, for each test category, we preserve some instances to form the support set and sample from the remaining instances to form the query set

C-way K-shot: The test set has C categories. For each test category, we preserve K instances as the support set

episode: Episode-based strategy used in the training stage to match the inference in the testing stage. First sample some categories and then sample the suppport/query set for each category

Methods

Metric based:
- Siamese network: the earliest and simplest metric-learning based few-shot learning, standard verification problem.
- Matching network: map a support set to a classification function p(y|x,S) (KNN or LSTM). For the LSTM version, there is another similar work using memory module.
- Relation network: calculate the relation score for 1-shot, calculate the average of relation scores for k-shot
- Prototypical network: compare with the prototype representations of each class. Each class can have more than one prototype representation. There are some other prototype-based methods [1] [2].

Optimization (gradient) based:
- [MAML] (https://arxiv.org/pdf/1703.03400.pdf)
- REPTILE) (an approximation of MAML)
- Meta-Learner LSTM
Model based:
- [learnet] [2] [3] [4] [5]: predict the parameters of classifiers for novel categories.
- [1]: predict the parameters of CNN feature extractor by virtue of memory module.
Generation based: generate more features for novel categories [1], [2]
Pretraind and fine-tune: use the whole meta-training set to learn feature extractor [1] [2] pretrain+MatchingNet [3]

Survey

Datsets

Meta-Dataset

Face Verification and Recognition

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Framework:

The similarity between two faces Ia and Ib can be unified in the following formulation:

M[W(F(S(Ia))), W(F(S(Ib)))]

in which S is synthesis operation (e.g., face alignment, frontalization), F is robust feature extraction, W is transformation subspace learning, M means face matching algorithm (e.g., NN, SVM, metric learning).

Paper:

DeepID 1,2,3: Deep learning face representation from predicting 10,000 classes
FaceNet: A Unified Embedding for Face Recognition and Clustering
code: https://cmusatyalab.github.io/openface/ (triplet loss)
DeepFace: Closing the Gap to Human-Level Performance in Face Verification (3D face alignment)
A Discriminative Feature Learning Approach for Deep Face Recognition
code: https://github.com/ydwen/caffe-face
Unconstrained Face Verification using Deep CNN Features (Joint Bayesian Metric Learning)
code: https://github.com/happynear/FaceVerification
A Light CNN for Deep Face Representation with Noisy Label
code: https://github.com/AlfredXiangWu/face_verification_experiment

Survey:

Face Recognition: From Traditional to Deep Learning Methods

Dataset:

LFW: http://vis-www.cs.umass.edu/lfw/

IJB-A: (free upon request) https://www.nist.gov/itl/iad/image-group/ijba-dataset-request-form

FERET: (free upon request) https://www.nist.gov/itl/iad/image-group/color-feret-database

CMU Multi-Pie: (not free) http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html

CASIA WebFace Database: (free upon request) http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

MS-Celeb-1M: https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/

MegaFace: (free upon request) http://megaface.cs.washington.edu/dataset/download_training.html

Cross-Age Celebrity Dataset: http://bcsiriuschen.github.io/CARC/

VGG face: http://www.robots.ox.ac.uk/~vgg/data/vgg_face/

Domain Adaptative Segmentation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Domain adaptation:

Align source feature map and target feature map: reduce H-divergence of regional feature map [4][8], cycle consistency [11]
Translation source domain image to target domain image: [5][6][10] (combine with target domain pseudo labels).
Training with pseudo labels on the target domain: Curriculumn learning [1] (global image label distribution and landmark superpixel label distribution); self-training [7]; use multiple models to vote for pseudo labels [9]

Domain adaptation with privileged information:

Domain adaptation with privileged information like depth: SPIGAN [2] (enforce synthetic image and generated image to predict the same depth), [3] (adversarial learning on depth)

Reference

[1] Zhang, Yang, Philip David, and Boqing Gong. “Curriculum domain adaptation for semantic segmentation of urban scenes.” ICCV, 2017.

[2] Lee, Kuan-Hui, et al. “SPIGAN: Privileged Adversarial Learning from Simulation.” ICLR, 2019.

[3] Vu, Tuan-Hung, et al. “DADA: Depth-aware Domain Adaptation in Semantic Segmentation.” arXiv preprint arXiv:1904.01886 (2019).

[4] Chen, Yuhua, Wen Li, and Luc Van Gool. “Road: Reality oriented adaptation for semantic segmentation of urban scenes.” CVPR, 2018.

[5] Hoffman, Judy, et al. “Cycada: Cycle-consistent adversarial domain adaptation.” arXiv preprint arXiv:1711.03213 (2017).

[6] Sankaranarayanan, Swami, et al. “Learning from synthetic data: Addressing domain shift for semantic segmentation.” CVPR, 2018.

[7] Zou, Yang, et al. “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training.”, ECCV, 2018.

[8] Hong, Weixiang, et al. “Conditional generative adversarial network for structured domain adaptation.” CVPR, 2018.

[9] Zhang, Junting, Chen Liang, and C-C. Jay Kuo. “A fully convolutional tri-branch network (FCTN) for domain adaptation.” ICASSP, 2018.

[10] Li, Yunsheng, Lu Yuan, and Nuno Vasconcelos. “Bidirectional Learning for Domain Adaptation of Semantic Segmentation.” arXiv preprint arXiv:1904.10620 (2019).

[11] Kang, Guoliang, et al. “Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation.” Advances in Neural Information Processing Systems 33 (2020).

Camouflaged Object Detection and Segmentation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Camouflaged Object Detection: [1]
Camouflaged Object Segmentation [2]

Reference

Fan, Deng-Ping, et al. “Camouflaged object detection.” CVPR, 2020.
Yan, Jinnan, et al. “MirrorNet: Bio-Inspired Adversarial Attack for Camouflaged Object Segmentation.” arXiv preprint arXiv:2007.12881 (2020).

Boundary-guided Semantic Segmentation

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

propagate information within each non-boundary region [1]
focus on unconfident boundary regions [2]
fuse boundary feature and image feature [3]

Reference

[1] Ding, Henghui, et al. “Boundary-aware feature propagation for scene segmentation.” ICCV, 2019.

[2] Marin, Dmitrii, et al. “Efficient segmentation: Learning downsampling near semantic boundaries.” ICCV, 2019.

[3] Takikawa, Towaki, et al. “Gated-scnn: Gated shape cnns for semantic segmentation.” ICCV, 2019.