Zoom in

(1) Zoom in a bounding box [1] [2]

(2) Zoom in salient region [3] [4]

  • relation to (1): if the salience region is rectangle and salience value is infinity, this should be equivalent to zooming in a bounding box.
  • relation to pooling: weighted pooling with salience map as weight map
  • relation to deformable CNN: use salience map to calculate offset for each position

Reference

[1] Fu, Jianlong, Heliang Zheng, and Tao Mei. “Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition.” CVPR, 2017.

[2] Zheng, Heliang, et al. “Learning multi-attention convolutional neural network for fine-grained image recognition.” ICCV, 2017.

[3] Recasens, Adria, et al. “Learning to zoom: a saliency-based sampling layer for neural networks.” ECCV, 2018.

[4] Zheng, Heliang, et al. “Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition.” arXiv preprint arXiv:1903.06150 (2019).