Mutual Information

Posted on 2022-06-16 | In paper note

[1]: use KL divergence as the upper-bound of mutual information (MI), which can be used to minimize MI. r(z) can be set as unit Gaussian for simplicity.

MINE[2]: lower-bound of MI based on KL divergence. Due to strong consistency, MINE can be used as a tight estimation of MI.

References

Alemi, Alexander A., et al. “Deep variational information bottleneck.” arXiv preprint arXiv:1612.00410 (2016).
Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., & Hjelm, D. Mutual information neural estimation, ICML, 2018.

Meta Learning

Posted on 2022-06-16 | In paper note

Taxonomy

1) metric-based: learn a good metric

matching network [1]
relation network [2]
prototypical network [3] [4]

2) optimization-based: gradient

Meta-Learner LSTM [5]
MAML [6] [7] [8]
REPTILE (an approximation of MAML) [9]

Optimization based methods aim to obtain good parameter initilization. If we simply train multiple tasks, the obtained model parameters may lead to sub optimum for each task.

3) model-based: predict model parameters

MANN [10]
MetaNet [11]

Reference:

Vinyals, Oriol, et al. “Matching networks for one shot learning.” NIPS, 2016.
Sung, Flood, et al. “Learning to compare: Relation network for few-shot learning.” CVPR, 2018.
Snell, Jake, Kevin Swersky, and Richard Zemel. “Prototypical networks for few-shot learning.” NIPS, 2017.
Ren, Mengye, et al. “Meta-learning for semi-supervised few-shot classification.” arXiv preprint arXiv:1803.00676 (2018).
Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-Shot Learning.” ICLR, 2017.
Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-agnostic meta-learning for fast adaptation of deep networks.” ICML, 2017.
Finn, Chelsea, and Sergey Levine. “Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm.” arXiv preprint arXiv:1710.11622 (2017).
Grant, Erin, et al. “Recasting gradient-based meta-learning as hierarchical bayes.” arXiv preprint arXiv:1801.08930 (2018).
A. Nichol, J. Achiam, and J. Schulman. On first-order meta-learning algorithms. arXiv, 1803.02999v2, 2018.
Adam Santoro, et al. “Meta-learning with memory-augmented neural networks.” ICML. 2016.
Munkhdalai, Tsendsuren, and Hong Yu. “Meta networks.” ICML, 2017.

Tutorials:

Lifelong Learning

Posted on 2022-06-16 | In paper note

Related concepts: online learning, incremental learning, continual learning

Survey

Resources

Stanford course

Latent Variable Regularization

Posted on 2022-06-16 | In paper note

Regulating latent variables or latent features can improve the generalizability of classifier and lower the error bound.

Regulating latent variables is essentially decrease the entropy of latent variables. There are some common tricks to decrease the entropy of latent variables, for example,

dropout
weight decay
add random noise to the latent variables in VAE and GAN.
add random perturbation to model parameters

For theoretical proof, please refer to here.

Vote Aggregation

Posted on 2022-06-16 | In paper note

label smoothing: [1] interpolating ground-truth label and uniform label
bootstrapping: [2] interpolate noisy label and label from previous iteration
noisy data+clean data: [3] interpolate noisy label and distilled label

[1] Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.” CVPR, 2016.

[2] Reed, Scott, et al. “Training deep neural networks on noisy labels with bootstrapping.” arXiv preprint arXiv:1412.6596 (2014).

[3] Li, Yuncheng, et al. “Learning from noisy labels with distillation.” ICCV, 2017.

Knowledge Graph

Posted on 2022-06-16 | In paper note

Definition: entities, attributes, and relationships

Two ways to construct knowledge graph:

probabilistic models (graphical model/random walk)
embedding based models

Incremental SVM

Posted on 2022-06-16 | In paper note

1) Approximate incremental SVM: pass through the dataset many times

Pegasos: select a training batch in each iteration
- python: https://github.com/ejlb/pegasos
```
   https://github.com/avaitla/Pegasos
```
- C: https://www.cs.huji.ac.il/~shais/code/index.html
- matlab: https://www.mathworks.com/matlabcentral/fileexchange/31401-pegasos-primal-estimated-sub-gradient-solver-for-svm?focused=5188208&tab=function

sklearn.linear_model: SGD

clf= sklearn.linear_model.SGDClassifier(learning_rate = 'constant', eta0 = 0.1, shuffle = False, n_iter = 1)
# get x1, y1 as a new instance
clf.partial_fit(x1, y1)
# get x2, y2
# update accuracy if needed
clf.partial_fit(x2, y2)

2) Exact incremental or decremental SVM: only pass through the dataset once

Incremental and Decremental Support Vector Machine Learning
http://www.isn.ucsd.edu/svm/incremental
SVM Incremental Learning, Adaptation and Optimization: extend the work above
matlab: https://github.com/diehl/Incremental-SVM-Learning-in-MATLAB
Incremental and decremental training for linear classification: extension of liblinear focusing on linear problem
http://www.csie.ntu.edu.tw/~cjlin/papers/ws/index.html

Implicit Modelling

Posted on 2022-06-16 | In paper note

simulates an infinite-depth network by fixed point iteration $h=f_{\theta}(h;x)$, in which $x$ is initial input, $\theta$ is the model parameter of one-time transformation. After infinite times of transformations, $x$ will approach the fixed point $h$. DEQ[1], MDEQ[2], iFPN[3]

Reference:

Bai, Shaojie, J. Zico Kolter, and Vladlen Koltun. “Deep equilibrium models.” Advances in Neural Information Processing Systems. 2019.
Bai, Shaojie, Vladlen Koltun, and J. Zico Kolter. “Multiscale deep equilibrium models.” arXiv preprint arXiv:2006.08656 (2020).
Wang, Tiancai, Xiangyu Zhang, and Jian Sun. “Implicit Feature Pyramid Network for Object Detection.” arXiv preprint arXiv:2012.13563 (2020).

Image Matching

Posted on 2022-06-16 | In paper note

Survey

Image Matching from Handcrafted to Deep Features: A Survey

Deep learning methods

Correlation tensor: [1] [2] [3]

Reference

Rocco, Ignacio, et al. “Neighbourhood consensus networks.” arXiv preprint arXiv:1810.10510 (2018).
Rocco, Ignacio, Relja Arandjelovic, and Josef Sivic. “Convolutional neural network architecture for geometric matching.” CVPR, 2017.
Chen, Jianchun, et al. “Arbicon-net: Arbitrary continuous geometric transformation networks for image registration.” NIPS, 2019.

Image and Video Proposals

Posted on 2022-06-16 | In paper note

Image proposals:

Selective search [1]: hierarchical grouping based on different similarity metrics [code]
Salient object detection [2]: identify the segment which is easy to compose from itself but hard from remaining parts of the image.
EdgeBox [3]: identify the boxes that tightly enclose a set of edges are likely to contain an object.
ACF detector [4]: compute gradient histograms on image pyramids
Region Proposal Network (RPN) from faster-RCNN [5]

Video proposals:

Video edgebox [1]: an extension of EdgeBox
RC3D [2]: an extension of RPN

[1] Zhu, Wangjiang, et al. “A key volume mining deep framework for action recognition.” CVPR. 2016.

[2] Xu, Huijuan, Abir Das, and Kate Saenko. “R-c3d: Region convolutional 3d network for temporal activity detection.” ICCV, 2017.