Newly Blog


  • Home

  • Tags

  • Categories

  • Archives

  • Search

From Anchor to ROI

Posted on 2022-06-16 | In paper note

layer area

From layer i to layer i+1, assume the parameters on layer i are $s_i$ (stride), $p_i$ (patch), $k_i$ (kernel filter size), the width or height of layer i are $r_i$. Then, based on common sense,

In the reverse process, $r_i = s_i r_{i+1}-s_i-2p_i+k_i$ or $r_i = s_i r_{i+1}-s_i+k_i$ if counting in padding area.

coordinate map

Now consider mapping the point $x_i$ on the ROI to the point $x_{i+1}$ on the feature map, which can be transformed to the layer area problem above. In particular, the receptive field formed by left-up corner and $x_i$ on the ROI can be mapped to the region formed by left-up corner and $x_{i+1}$ on the feature map. Based on the similar formula for the layer area problem above (note the only difference is that we only include left padding and up padding, and subtract the radius of kernel filter $(k_i-1)/2$,

The above coordinate system starts from 1. When the coordinate system starts from 0,

which can be simplified as

when $p_i=floor(k_i/2)$, $x_i=s_i x_{i+1}$ approximately, which is the simplest case.

By applying $x_i=s_i x_{i+1}+(\frac{k_i-1}{2}-p_i)$ recursively, we can achieve a general solution

in which $\alpha_L = \prod_{l=1}^{L-1} s_l$ and $\beta_L=\sum_{l=1}^{L-1} (\prod_{n=1}^{l-1} s_n)(\frac{k_l-1}{2}-p_l) $

anchor box to ROI

Given two corner points of an anchor box on the feature map, we can find their corresponding points on the original image, which determine the ROI.

Frequency Domain

Posted on 2022-06-16 | In paper note
  1. Distinguish generated fake images and real images in the freqency domain. [2]

  2. Use frequency map as network input or output [1] [5] [6]

  3. Use intermediate frequency features [7] [9]

  4. An image can be composed of or decomposed into low-frequency part and high-frequency part [3] [8] [4] [10]

Reference

  1. Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren, “Learning in the Frequency Domain”, CVPR, 2020.

  2. Wang, Sheng-Yu, et al. “CNN-generated images are surprisingly easy to spot… for now.” arXiv preprint arXiv:1912.11035 (2019).

  3. ayush Bansal, Yaser Sheikh, Deva Ramanan, “PixelNN: Example-based Image Synthesis”, ICLR 2018.

  4. Yanchao Yang, Stefano Soatto, “FDA: Fourier Domain Adaptation for Semantic Segmentation”, CVPR 2020.

  5. Roy, Hiya, et al. “Image inpainting using frequency domain priors.” arXiv preprint arXiv:2012.01832 (2020).

  6. Shen, Xing, et al. “DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation.” arXiv preprint arXiv:2011.09876 (2020).

  7. Suvorov, Roman, et al. “Resolution-robust Large Mask Inpainting with Fourier Convolutions.” WACV (2021).

  8. Yu, Yingchen, et al. “WaveFill: A Wavelet-based Generation Network for Image Inpainting.” ICCV, 2021.

  9. Mardani, Morteza, et al. “Neural ffts for universal texture image synthesis.” NeurIPS (2020).

  10. Cai, Mu, et al. “Frequency domain image translation: More photo-realistic, better identity-preserving.” ICCV, 2021.

Forecast Future based on One Still Image

Posted on 2022-06-16 | In paper note
  1. Predict visual feature of one future frame [1]

  2. Predict optical flow of one future frame [2]

  3. Predict one future frame [4] (a special case of video prediction)

  4. Predict future trajectories [5]

  5. Predict optical flows of future frames, and then obtain future frames [3]

Reference

  1. Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating visual representations from unlabeled video.” CVPR, 2016.

  2. Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.

  3. Li, Yijun, et al. “Flow-grounded spatial-temporal video prediction from still images.” ECCV, 2018.

  4. Xue, Tianfan, et al. “Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks.” NIPS, 2016.

  5. Walker, Jacob, et al. “An uncertain future: Forecasting from static images using variational autoencoders.” ECCV, 2016.

Fine-grained Dataset

Posted on 2022-06-16 | In paper note

Surveys:

  1. Deep learning for fine-grained image analysis: A survey [1]: include few-shot classification, few-shot retrieval, and few-shot generation

  2. A survey on deep learningbased fine-grained object classification and semantic segmentation [2]

[1] Xiu-Shen Wei, Jianxin Wu, Quan Cui. “Deep learning for fine-grained image analysis: A survey.” arXiv preprint arXiv:1907.03069 (2019).

[2] Zhao, Bo, et al. “A survey on deep learning-based fine-grained object classification and semantic segmentation.” International Journal of Automation and Computing 14.2 (2017): 119-135.

Datasets:

  1. clothing dataset
  2. car dataset
  3. CUB, Birdsnap
  4. scene dataset
  5. dog dataset
  6. flower dataset
  7. aircraft dataset
  8. Food-101 dataset

Few-Shot Object Detection

Posted on 2022-06-16 | In paper note
  • Feature generation for novel categories: [1] [3] [4]

  • Model emsembling: [2]

Reference

[1] Zhang, Weilin, and Yu-Xiong Wang. “Hallucination Improves Few-Shot Object Detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[2] Zhang, Weilin, Yu-Xiong Wang, and David A. Forsyth. “Cooperating RPN’s Improve Few-Shot Object Detection.” arXiv preprint arXiv:2011.10142 (2020).

[3] Xu, Honghui, et al. “Few-Shot Object Detection via Sample Processing.” IEEE Access 9 (2021): 29207-29221.

[4] Wu, Aming, et al. “Universal-prototype enhancing for few-shot object detection.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Few-Shot Image Generation From Large Dataset to Small Dataset

Posted on 2022-06-16 | In paper note
  1. Transfer a large proportion of parameters and only update a few parameters [1] updates scaling and shifting parameters. [2] updates the miner before generator. [3] uses Fisher information to select the parameters to be updates. Empirically, the last layers are prone to be frozen. Similarly, in [6], the last layers are frozen and scaling/shifting parameters are predicted.

  2. Transfer structure similarity from large dataset to small dataset: [4]

  3. Transfer parameter basis: [5] adapts the singular values of the pre-trained weights while freezing the corresponding singular vectors.

Reference

  1. Atsuhiro Noguchi, Tatsuya Harada: “Image generation from small datasets via batch statistics adaptation.” ICCV (2019)

  2. Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, Joost van de Weijer: “MineGAN: effective knowledge transfer from GANs to target domains with few images.” CVPR (2020).

  3. Yijun Li, Richard Zhang, Jingwan Lu, Eli Shechtman: “Few-shot Image Generation with Elastic Weight Consolidation.” NeurIPS (2020).

  4. Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang: “Few-shot Image Generation via Cross-domain Correspondence.” CVPR (2021).

  5. Esther Robb, Wen-Sheng Chu, Abhishek Kumar, Jia-Bin Huang: “Few-Shot Adaptation of Generative Adversarial Networks.” arXiv (2020).

  6. Miaoyun Zhao, Yulai Cong, Lawrence Carin: “On Leveraging Pretrained GANs for Generation with Limited Data.” ICML (2020).

Few-Shot Image Generation From Base Categories to Novel Categories

Posted on 2022-06-16 | In paper note
  1. Fusion-based method: Generative Matching Network (GMN) [1] (VAE with matching network for generator and recognizer). MatchingGAN [3] learns reasonable interpolation coefficients. F2GAN [5] first fuses high-level features and then fills in low-level details.

  2. Optimization-based method: FIGR [2] is based on Reptile. DAWSON [4] is based on MAML.

  3. Transformation-based method: DAGAN [6] samples random vectors to generate new images. DeltaGAN [7] learns sample-specific delta.

Reference

[1] Bartunov, Sergey, and Dmitry Vetrov. “Few-shot generative modelling with generative matching networks.” , 2018.

[2] Clouâtre, Louis, and Marc Demers. “FIGR: Few-shot ImASTATISage Generation with Reptile.” arXiv preprint arXiv:1901.02199 (2019).

[3] Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang, “MatchingGAN: Matching-based Few-shot Image Generation”, ICME, 2020

[4] Weixin Liang, Zixuan Liu, Can Liu: “DAWSON: A Domain Adaptive Few Shot Generation Framework.” CoRR abs/2001.00576 (2020)

[5] Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, Liqing Zhang: “F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation.” ACM MM (2020)

[6] Antreas Antoniou, Amos J. Storkey, Harrison Edwards: “Data Augmentation Generative Adversarial Networks.” stat (2018)

[7] Yan Hong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang: “DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta.” CoRR abs/2009.08753 (2020)

Few-Shot Feature Generation

Posted on 2022-06-16 | In paper note

Few-shot Feature Generation

  1. Meta-learning method: [1]

  2. Delta-based: delta between each pair of samples [2]; delta between each sample and class center [3] [4]

Reference

[1] Zhang, Ruixiang, et al. “Metagan: An adversarial approach to few-shot learning.” NIPS, 2018.

[2] Schwartz, Eli, et al. “Delta-encoder: an effective sample synthesis method for few-shot object recognition.” Advances in Neural Information Processing Systems. 2018.

[3] Liu, Jialun, et al. “Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective.” arXiv preprint arXiv:2002.10826 (2020).

[4] Yin, Xi, et al. “Feature transfer learning for face recognition with under-represented data.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

Few-Shot Classification

Posted on 2022-06-16 | In paper note

One-shot/few-shot learning

The first one-shot learning paper dates back to 2006, but becomes more popular recently.

Concepts

training/validation/test categories: Training categories and test categories have no overlap

support(sample)/query(batch) set: In the testing stage, for each test category, we preserve some instances to form the support set and sample from the remaining instances to form the query set

C-way K-shot: The test set has C categories. For each test category, we preserve K instances as the support set

episode: Episode-based strategy used in the training stage to match the inference in the testing stage. First sample some categories and then sample the suppport/query set for each category

Methods

  • Metric based:

    • Siamese network: the earliest and simplest metric-learning based few-shot learning, standard verification problem.

    • Matching network: map a support set to a classification function p(y|x,S) (KNN or LSTM). For the LSTM version, there is another similar work using memory module.

    • Relation network: calculate the relation score for 1-shot, calculate the average of relation scores for k-shot

    • Prototypical network: compare with the prototype representations of each class. Each class can have more than one prototype representation. There are some other prototype-based methods [1] [2].

  • Optimization (gradient) based:

    • [MAML] (https://arxiv.org/pdf/1703.03400.pdf)

    • REPTILE) (an approximation of MAML)

    • Meta-Learner LSTM

  • Model based:

    • [learnet] [2] [3] [4] [5]: predict the parameters of classifiers for novel categories.

    • [1]: predict the parameters of CNN feature extractor by virtue of memory module.

  • Generation based: generate more features for novel categories [1], [2]

  • Pretraind and fine-tune: use the whole meta-training set to learn feature extractor [1] [2] pretrain+MatchingNet [3]

Survey

  1. Generalizing from a Few Examples: A Survey on Few-Shot Learning

  2. Learning from Few Samples: A Survey

Datsets

  1. Meta-Dataset

Face Verification and Recognition

Posted on 2022-06-16 | In paper note

Framework:

The similarity between two faces Ia and Ib can be unified in the following formulation:

M[W(F(S(Ia))), W(F(S(Ib)))]

in which S is synthesis operation (e.g., face alignment, frontalization), F is robust feature extraction, W is transformation subspace learning, M means face matching algorithm (e.g., NN, SVM, metric learning).

Paper:

  • DeepID 1,2,3: Deep learning face representation from predicting 10,000 classes

  • FaceNet: A Unified Embedding for Face Recognition and Clustering
    code: https://cmusatyalab.github.io/openface/ (triplet loss)

  • DeepFace: Closing the Gap to Human-Level Performance in Face Verification (3D face alignment)

  • A Discriminative Feature Learning Approach for Deep Face Recognition
    code: https://github.com/ydwen/caffe-face

  • Unconstrained Face Verification using Deep CNN Features (Joint Bayesian Metric Learning)
    code: https://github.com/happynear/FaceVerification

  • A Light CNN for Deep Face Representation with Noisy Label
    code: https://github.com/AlfredXiangWu/face_verification_experiment

Survey:

  • Face Recognition: From Traditional to Deep Learning Methods

Dataset:

LFW: http://vis-www.cs.umass.edu/lfw/

IJB-A: (free upon request) https://www.nist.gov/itl/iad/image-group/ijba-dataset-request-form

FERET: (free upon request) https://www.nist.gov/itl/iad/image-group/color-feret-database

CMU Multi-Pie: (not free) http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html

CASIA WebFace Database: (free upon request) http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

MS-Celeb-1M: https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/

MegaFace: (free upon request) http://megaface.cs.washington.edu/dataset/download_training.html

Cross-Age Celebrity Dataset: http://bcsiriuschen.github.io/CARC/

VGG face: http://www.robots.ox.ac.uk/~vgg/data/vgg_face/

1…202122…24
Li Niu

Li Niu

239 posts
18 categories
114 tags
Homepage GitHub Linkedin
© 2025 Li Niu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4