Newly Blog


  • Home

  • Tags

  • Categories

  • Archives

  • Search

Optical Flow

Posted on 2022-06-16 | In paper note
  1. Estimate optical flow based on video: FlowNet [1], FlowNet2 [2]

  2. Estimate optical flow based on image: [3] [4] [5]

[1] Dosovitskiy, Alexey, et al. “Flownet: Learning optical flow with convolutional networks.” ICCV, 2015.

[2] Ilg, Eddy, et al. “Flownet 2.0: Evolution of optical flow estimation with deep networks.” CVPR, 2017.

[3] Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.

[4] Silvia L. Pintea, Jan C. van Gemert, and Arnold W. M. Smeulders, “Deja Vu: Motion Prediction in Static Images”, arxiv, 2018.

[5] Walker, Jacob, Abhinav Gupta, and Martial Hebert. “Dense optical flow prediction from a static image.” ICCV, 2015.

Normalization

Posted on 2022-06-16 | In paper note

Normalize weights:

  1. weight normalization [1]: $\mathbf{w}=\frac{g}{|\mathbf{v}|} \mathbf{v}$, weight normalization can be viewed as a cheaper and less noisy approximation to batch normalization

Normalize outputs:

  1. batch normalization [2]: make the input and output have the same variance

  2. layer normalization [3]

  3. instance normalization [4]

  4. group normalization [5]

N as the batch axis, C as the channel axis, and (H, W)
as the spatial axes

[1] Salimans T, Kingma D P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks[C]//Advances in Neural Information Processing Systems. 2016: 901-909.

[2] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

[3] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv:1607.06450, 2016.

[4] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022, 2016.

[5] Wu Y, He K. Group normalization[J]. arXiv preprint arXiv:1803.08494, 2018.

Mutual Information

Posted on 2022-06-16 | In paper note
  1. [1]: use KL divergence as the upper-bound of mutual information (MI), which can be used to minimize MI. r(z) can be set as unit Gaussian for simplicity.
  1. MINE[2]: lower-bound of MI based on KL divergence. Due to strong consistency, MINE can be used as a tight estimation of MI.

References

  1. Alemi, Alexander A., et al. “Deep variational information bottleneck.” arXiv preprint arXiv:1612.00410 (2016).

  2. Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., & Hjelm, D. Mutual information neural estimation, ICML, 2018.

Meta Learning

Posted on 2022-06-16 | In paper note

Taxonomy

1) metric-based: learn a good metric

  • matching network [1]
  • relation network [2]
  • prototypical network [3] [4]

2) optimization-based: gradient

  • Meta-Learner LSTM [5]
  • MAML [6] [7] [8]
  • REPTILE (an approximation of MAML) [9]

    Optimization based methods aim to obtain good parameter initilization. If we simply train multiple tasks, the obtained model parameters may lead to sub optimum for each task.

3) model-based: predict model parameters

  • MANN [10]
  • MetaNet [11]

Reference:

  1. Vinyals, Oriol, et al. “Matching networks for one shot learning.” NIPS, 2016.
  2. Sung, Flood, et al. “Learning to compare: Relation network for few-shot learning.” CVPR, 2018.
  3. Snell, Jake, Kevin Swersky, and Richard Zemel. “Prototypical networks for few-shot learning.” NIPS, 2017.
  4. Ren, Mengye, et al. “Meta-learning for semi-supervised few-shot classification.” arXiv preprint arXiv:1803.00676 (2018).
  5. Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-Shot Learning.” ICLR, 2017.
  6. Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-agnostic meta-learning for fast adaptation of deep networks.” ICML, 2017.
  7. Finn, Chelsea, and Sergey Levine. “Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm.” arXiv preprint arXiv:1710.11622 (2017).
  8. Grant, Erin, et al. “Recasting gradient-based meta-learning as hierarchical bayes.” arXiv preprint arXiv:1801.08930 (2018).
  9. A. Nichol, J. Achiam, and J. Schulman. On first-order meta-learning algorithms. arXiv, 1803.02999v2, 2018.
  10. Adam Santoro, et al. “Meta-learning with memory-augmented neural networks.” ICML. 2016.
  11. Munkhdalai, Tsendsuren, and Hong Yu. “Meta networks.” ICML, 2017.

Tutorials:

  • https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html

  • http://metalearning-symposium.ml/files/vinyals.pdf

  • https://github.com/floodsung/Meta-Learning-Papers

  • A more comprehensive survey: https://arxiv.org/pdf/1810.03548.pdf

Lifelong Learning

Posted on 2022-06-16 | In paper note

Related concepts: online learning, incremental learning, continual learning

Survey

  • Continual lifelong learning with neural networks: A review
  • A continual learning survey: Defying forgetting in classification tasks
  • Class-incremental learning: survey and performance evaluation

Resources

  • Stanford course

Latent Variable Regularization

Posted on 2022-06-16 | In paper note

Regulating latent variables or latent features can improve the generalizability of classifier and lower the error bound.

Regulating latent variables is essentially decrease the entropy of latent variables. There are some common tricks to decrease the entropy of latent variables, for example,

  1. dropout
  2. weight decay
  3. add random noise to the latent variables in VAE and GAN.
  4. add random perturbation to model parameters

For theoretical proof, please refer to here.

Vote Aggregation

Posted on 2022-06-16 | In paper note
  1. label smoothing: [1] interpolating ground-truth label and uniform label

  2. bootstrapping: [2] interpolate noisy label and label from previous iteration

  3. noisy data+clean data: [3] interpolate noisy label and distilled label

[1] Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.” CVPR, 2016.

[2] Reed, Scott, et al. “Training deep neural networks on noisy labels with bootstrapping.” arXiv preprint arXiv:1412.6596 (2014).

[3] Li, Yuncheng, et al. “Learning from noisy labels with distillation.” ICCV, 2017.

Knowledge Graph

Posted on 2022-06-16 | In paper note

Definition: entities, attributes, and relationships

Two ways to construct knowledge graph:

  1. probabilistic models (graphical model/random walk)

  2. embedding based models

Incremental SVM

Posted on 2022-06-16 | In paper note

1) Approximate incremental SVM: pass through the dataset many times

  • Pegasos: select a training batch in each iteration

    • python: https://github.com/ejlb/pegasos
         https://github.com/avaitla/Pegasos
      
    • C: https://www.cs.huji.ac.il/~shais/code/index.html
    • matlab: https://www.mathworks.com/matlabcentral/fileexchange/31401-pegasos-primal-estimated-sub-gradient-solver-for-svm?focused=5188208&tab=function
  • sklearn.linear_model: SGD

    1
    2
    3
    4
    5
    6
    clf= sklearn.linear_model.SGDClassifier(learning_rate = 'constant', eta0 = 0.1, shuffle = False, n_iter = 1)
    # get x1, y1 as a new instance
    clf.partial_fit(x1, y1)
    # get x2, y2
    # update accuracy if needed
    clf.partial_fit(x2, y2)

2) Exact incremental or decremental SVM: only pass through the dataset once

  • Incremental and Decremental Support Vector Machine Learning
    http://www.isn.ucsd.edu/svm/incremental

  • SVM Incremental Learning, Adaptation and Optimization: extend the work above
    matlab: https://github.com/diehl/Incremental-SVM-Learning-in-MATLAB

  • Incremental and decremental training for linear classification: extension of liblinear focusing on linear problem
    http://www.csie.ntu.edu.tw/~cjlin/papers/ws/index.html

Implicit Modelling

Posted on 2022-06-16 | In paper note
  1. simulates an infinite-depth network by fixed point iteration $h=f_{\theta}(h;x)$, in which $x$ is initial input, $\theta$ is the model parameter of one-time transformation. After infinite times of transformations, $x$ will approach the fixed point $h$. DEQ[1], MDEQ[2], iFPN[3]

Reference:

  1. Bai, Shaojie, J. Zico Kolter, and Vladlen Koltun. “Deep equilibrium models.” Advances in Neural Information Processing Systems. 2019.
  2. Bai, Shaojie, Vladlen Koltun, and J. Zico Kolter. “Multiscale deep equilibrium models.” arXiv preprint arXiv:2006.08656 (2020).
  3. Wang, Tiancai, Xiangyu Zhang, and Jian Sun. “Implicit Feature Pyramid Network for Object Detection.” arXiv preprint arXiv:2012.13563 (2020).
1…111213…24
Li Niu

Li Niu

239 posts
18 categories
114 tags
Homepage GitHub Linkedin
© 2025 Li Niu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4