Application

  • Shadow detection: [1]

  • Object-shadow pair detection/matting: [2] [10] [11]

  • Shadow removal: [3] [4] [5] [6]

  • Shadow generation: [7] [8]

  • Remove occluder and its associated shadow [9]

Dataset

Shadow Generation

  1. Shadow-AR (rendered) paper
  2. RGB-AO-depth (rendered) paper
  3. Composition datasets: WILDTRACK, Penn-Fudan, UA-DETRAC, Cityscapes, ShapeNet paper
  4. Soft shadow dataset (rendered) paper
  5. ShadowGAN (rendered, 12,400 rendered images, 9265 objects, 110 textures for rendering the plane, up to four objects in each scene) paper
  6. SID (single object, 25, 000 images, 12, 500 3D objects, 50 homogeneous color and 200 variable set of textured patterns) paper
  7. SID2 (45,000 images, similar to SID, more than one object in each scene) paper
  8. SHAD3S paper
  9. DESOBA paper

Shadow Removal/Detection

  1. ISTD/ ISTD+ (1870 0 triplets of shadow, shadow mask and shadow-free images) paper
  2. USR(unpaired, 2,445 shadow images, 1,770 shadow-free) paper
  3. SRD/ SRD+ (3088 pairs, paired shadow and shadow-free, without the ground-truth shadow mask) paper
  4. LRSS (37 image pairs, soft shadow) paper
  5. UIUC (76 pairs, paired shadow/shadow-free) paper
  6. GTAV (5723 pairs, 5110 daylight scenes, occlude objects inside camera) paper
  7. SynShadow (based on USR, occlude objects outside camera, shadow/shadow-free/matte image triplets synthesized from rendered 10,000 matte images and about 1,800 background images) paper
  8. UCF (245 pairs, shadow/shadow mask, only for detection)
  9. SBU (4727 pairs, shadow/shadow mask, only for detection)
  10. CUHK-Shadow (10,500 pairs, shadow/shadow mask, only for detection) paper
  11. SOBA (1013 images) paper
  12. AISD (514 pairs, shadow/shadow mask, only for detection, areial images) paper
  13. video shadow removal dataset (8 videos, shadow/shadow mask/shadow free) paper
  14. CMU dataset(135 pairs, shadow/shadow boundaries) paper
  15. ViSha (120 videos with 11685 frames) paper
  16. VISAD (82 videos, half-annotated) paper

References

  1. Zhu, Lei, et al. “Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.

  2. Wang, Tianyu, et al. “Instance shadow detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

  3. Hu, Xiaowei, et al. “Mask-ShadowGAN: Learning to remove shadows from unpaired data.” Proceedings of the IEEE International Conference on Computer Vision. 2019.

  4. Le, Hieu, and Dimitris Samaras. “Shadow removal via shadow image decomposition.” Proceedings of the IEEE International Conference on Computer Vision. 2019.

  5. Xiaodong, Cun, Pun Chi-Man, and Shi Cheng. “Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN.” arXiv preprint arXiv:1911.08718 (2019).

  6. Le, Hieu, and Dimitris Samaras. “From Shadow Segmentation to Shadow Removal.” European Conference on Computer Vision. Springer, Cham, 2020.

  7. Liu, Daquan, et al. “ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

  8. Zhan, Fangneng, et al. “Adversarial Image Composition with Auxiliary Illumination.” Proceedings of the Asian Conference on Computer Vision. 2020.

  9. Zhang, Edward, et al. “No Shadow Left Behind: Removing Objects and their Shadows using Approximate Lighting and Geometry.” CVPR, 2021.

  10. Wang, Tianyu, et al. “Single-stage instance shadow detection with bidirectional relation learning.” CVPR, 2021.

  11. Lu, Erika, et al. “Omnimatte: Associating objects and their effects in video.” CVPR, 2021.

Statistical methods

  • use a model (e.g., Gaussian) to fit the distribution of all data
  • use two models to fit the distributions of non-outliers and outliers separately
  • Grubbs’ test

Distance based methods

  • the density within a neighborhood
  • the distance from a nearest neighbor

Learning based method

  • clustering, the smallest cluster is likely to contain outliers
  • one-class classifier (e.g., one-class SVM)
  • binary classifier (e.g., naive bayes for spam filtering, weighted binary SVM)

  1. Estimate optical flow based on video: FlowNet [1], FlowNet2 [2]

  2. Estimate optical flow based on image: [3] [4] [5]

[1] Dosovitskiy, Alexey, et al. “Flownet: Learning optical flow with convolutional networks.” ICCV, 2015.

[2] Ilg, Eddy, et al. “Flownet 2.0: Evolution of optical flow estimation with deep networks.” CVPR, 2017.

[3] Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.

[4] Silvia L. Pintea, Jan C. van Gemert, and Arnold W. M. Smeulders, “Deja Vu: Motion Prediction in Static Images”, arxiv, 2018.

[5] Walker, Jacob, Abhinav Gupta, and Martial Hebert. “Dense optical flow prediction from a static image.” ICCV, 2015.

  1. [1]: use KL divergence as the upper-bound of mutual information (MI), which can be used to minimize MI. r(z) can be set as unit Gaussian for simplicity.
  1. MINE[2]: lower-bound of MI based on KL divergence. Due to strong consistency, MINE can be used as a tight estimation of MI.

References

  1. Alemi, Alexander A., et al. “Deep variational information bottleneck.” arXiv preprint arXiv:1612.00410 (2016).

  2. Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., & Hjelm, D. Mutual information neural estimation, ICML, 2018.

Normalize weights:

  1. weight normalization [1]: $\mathbf{w}=\frac{g}{|\mathbf{v}|} \mathbf{v}$, weight normalization can be viewed as a cheaper and less noisy approximation to batch normalization

Normalize outputs:

  1. batch normalization [2]: make the input and output have the same variance

  2. layer normalization [3]

  3. instance normalization [4]

  4. group normalization [5]

N as the batch axis, C as the channel axis, and (H, W)
as the spatial axes

[1] Salimans T, Kingma D P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks[C]//Advances in Neural Information Processing Systems. 2016: 901-909.

[2] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

[3] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv:1607.06450, 2016.

[4] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022, 2016.

[5] Wu Y, He K. Group normalization[J]. arXiv preprint arXiv:1803.08494, 2018.

Taxonomy

1) metric-based: learn a good metric

  • matching network [1]
  • relation network [2]
  • prototypical network [3] [4]

2) optimization-based: gradient

  • Meta-Learner LSTM [5]
  • MAML [6] [7] [8]
  • REPTILE (an approximation of MAML) [9]

    Optimization based methods aim to obtain good parameter initilization. If we simply train multiple tasks, the obtained model parameters may lead to sub optimum for each task.

3) model-based: predict model parameters

Reference:

  1. Vinyals, Oriol, et al. “Matching networks for one shot learning.” NIPS, 2016.
  2. Sung, Flood, et al. “Learning to compare: Relation network for few-shot learning.” CVPR, 2018.
  3. Snell, Jake, Kevin Swersky, and Richard Zemel. “Prototypical networks for few-shot learning.” NIPS, 2017.
  4. Ren, Mengye, et al. “Meta-learning for semi-supervised few-shot classification.” arXiv preprint arXiv:1803.00676 (2018).
  5. Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-Shot Learning.” ICLR, 2017.
  6. Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-agnostic meta-learning for fast adaptation of deep networks.” ICML, 2017.
  7. Finn, Chelsea, and Sergey Levine. “Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm.” arXiv preprint arXiv:1710.11622 (2017).
  8. Grant, Erin, et al. “Recasting gradient-based meta-learning as hierarchical bayes.” arXiv preprint arXiv:1801.08930 (2018).
  9. A. Nichol, J. Achiam, and J. Schulman. On first-order meta-learning algorithms. arXiv, 1803.02999v2, 2018.
  10. Adam Santoro, et al. “Meta-learning with memory-augmented neural networks.” ICML. 2016.
  11. Munkhdalai, Tsendsuren, and Hong Yu. “Meta networks.” ICML, 2017.

Tutorials:

  1. binary map

  2. frequency: DCT [1]

  3. PolarMask [2]

  4. Hyperbolic [3]

Reference

[1] Shen, Xing, et al. “Dct-mask: Discrete cosine transform mask representation for instance segmentation.” CVPR, 2021.

[2] Xie, Enze, et al. “Polarmask: Single shot instance segmentation with polar representation.” CVPR, 2020.

[3] GhadimiAtigh, Mina, et al. “Hyperbolic Image Segmentation.” arXiv preprint arXiv:2203.05898 (2022).

Regulating latent variables or latent features can improve the generalizability of classifier and lower the error bound.

Regulating latent variables is essentially decrease the entropy of latent variables. There are some common tricks to decrease the entropy of latent variables, for example,

  1. dropout
  2. weight decay
  3. add random noise to the latent variables in VAE and GAN.
  4. add random perturbation to model parameters

For theoretical proof, please refer to here.

  1. label smoothing: [1] interpolating ground-truth label and uniform label

  2. bootstrapping: [2] interpolate noisy label and label from previous iteration

  3. noisy data+clean data: [3] interpolate noisy label and distilled label

[1] Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.” CVPR, 2016.

[2] Reed, Scott, et al. “Training deep neural networks on noisy labels with bootstrapping.” arXiv preprint arXiv:1412.6596 (2014).

[3] Li, Yuncheng, et al. “Learning from noisy labels with distillation.” ICCV, 2017.

0%