Newly Blog

Knowledge Graph

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Definition: entities, attributes, and relationships

Two ways to construct knowledge graph:

probabilistic models (graphical model/random walk)
embedding based models

Incremental SVM

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

1) Approximate incremental SVM: pass through the dataset many times

Pegasos: select a training batch in each iteration
- python: https://github.com/ejlb/pegasos
```
   https://github.com/avaitla/Pegasos
```
- C: https://www.cs.huji.ac.il/~shais/code/index.html
- matlab: https://www.mathworks.com/matlabcentral/fileexchange/31401-pegasos-primal-estimated-sub-gradient-solver-for-svm?focused=5188208&tab=function

sklearn.linear_model: SGD

clf= sklearn.linear_model.SGDClassifier(learning_rate = 'constant', eta0 = 0.1, shuffle = False, n_iter = 1)
# get x1, y1 as a new instance
clf.partial_fit(x1, y1)
# get x2, y2
# update accuracy if needed
clf.partial_fit(x2, y2)

2) Exact incremental or decremental SVM: only pass through the dataset once

Incremental and Decremental Support Vector Machine Learning
http://www.isn.ucsd.edu/svm/incremental
SVM Incremental Learning, Adaptation and Optimization: extend the work above
matlab: https://github.com/diehl/Incremental-SVM-Learning-in-MATLAB
Incremental and decremental training for linear classification: extension of liblinear focusing on linear problem
http://www.csie.ntu.edu.tw/~cjlin/papers/ws/index.html

Implicit Modelling

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

simulates an infinite-depth network by fixed point iteration $h=f_{\theta}(h;x)$, in which $x$ is initial input, $\theta$ is the model parameter of one-time transformation. After infinite times of transformations, $x$ will approach the fixed point $h$. DEQ[1], MDEQ[2], iFPN[3]

Reference:

Bai, Shaojie, J. Zico Kolter, and Vladlen Koltun. “Deep equilibrium models.” Advances in Neural Information Processing Systems. 2019.
Bai, Shaojie, Vladlen Koltun, and J. Zico Kolter. “Multiscale deep equilibrium models.” arXiv preprint arXiv:2006.08656 (2020).
Wang, Tiancai, Xiangyu Zhang, and Jian Sun. “Implicit Feature Pyramid Network for Object Detection.” arXiv preprint arXiv:2012.13563 (2020).

Image Matching

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Survey

Image Matching from Handcrafted to Deep Features: A Survey

Deep learning methods

Correlation tensor: [1] [2] [3]

Reference

Rocco, Ignacio, et al. “Neighbourhood consensus networks.” arXiv preprint arXiv:1810.10510 (2018).
Rocco, Ignacio, Relja Arandjelovic, and Josef Sivic. “Convolutional neural network architecture for geometric matching.” CVPR, 2017.
Chen, Jianchun, et al. “Arbicon-net: Arbitrary continuous geometric transformation networks for image registration.” NIPS, 2019.

Image and Video Proposals

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Image proposals:

Selective search [1]: hierarchical grouping based on different similarity metrics [code]
Salient object detection [2]: identify the segment which is easy to compose from itself but hard from remaining parts of the image.
EdgeBox [3]: identify the boxes that tightly enclose a set of edges are likely to contain an object.
ACF detector [4]: compute gradient histograms on image pyramids
Region Proposal Network (RPN) from faster-RCNN [5]

Video proposals:

Video edgebox [1]: an extension of EdgeBox
RC3D [2]: an extension of RPN

[1] Zhu, Wangjiang, et al. “A key volume mining deep framework for action recognition.” CVPR. 2016.

[2] Xu, Huijuan, Abir Das, and Kate Saenko. “R-c3d: Region convolutional 3d network for temporal activity detection.” ICCV, 2017.

Reference

Graph Neural Network

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Representative methods

Tutorial slides on AAAI2019

Graph Convolutional Network (GCN) [6]
Graph Attention Network [7]
GraphSAGE(SAmple and aggreGatE) [8]
Transformer [9]: transformer can be deemed as a type of GNN.

Avoid oversmoothing and go deeper

Initial residual and Identity mapping [12]
GCN and PageRank [13]

Graph similarity/matching

A survey on graph similarity [4]

Graph transformation:

pooling/unpooling [5]

Dynamic Graph:

Pointer Graph Network [11]

Application

GNN for zero-shot learning [1][2]: treat each category as a graph node
GNN for multi-view learning [3]: treat each view as a graph node
GNN for clustering [10]

Reference:

Wang, Xiaolong, Yufei Ye, and Abhinav Gupta. “Zero-shot recognition via semantic embeddings and knowledge graphs.” CVPR, 2018.
Lee, Chung-Wei, et al. “Multi-label zero-shot learning with structured knowledge graphs.” CVPR, 2018.
Wang, Dongang, et al. “Dividing and aggregating network for multi-view action recognition.” ECCV, 2018.
Ma, Guixiang, et al. “Deep Graph Similarity Learning: A Survey.” arXiv preprint arXiv:1912.11615 (2019).
Hongyang Gao, Shuiwang Ji: Graph U-Nets. CoRR abs/1905.05178 (2019)
Kipf, Thomas N., and Max Welling. “Semi-supervised classification with graph convolutional networks.” arXiv preprint arXiv:1609.02907 (2016).
Veličković, Petar, et al. “Graph attention networks.” arXiv preprint arXiv:1710.10903 (2017).
Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.” NeurIPS, 2017.
Vaswani, Ashish, et al. “Attention is all you need.” NeurIPS, 2017.
Bo, Deyu, et al. “Structural Deep Clustering Network.” Proceedings of The Web Conference 2020. 2020.
Veličković, Petar, et al. “Pointer Graph Networks.” arXiv preprint arXiv:2006.06380 (2020).
Chen, Ming, et al. “Simple and deep graph convolutional networks.” arXiv preprint arXiv:2007.02133 (2020).
Klicpera, Johannes, Aleksandar Bojchevski, and Stephan Günnemann. “Predict then propagate: Graph neural networks meet personalized pagerank.” arXiv preprint arXiv:1810.05997 (2018).
Dosovitskiy, Alexey, et al. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020).
Carion, Nicolas, et al. “End-to-End Object Detection with Transformers.” arXiv preprint arXiv:2005.12872 (2020).
Chen, Hanting, et al. “Pre-Trained Image Processing Transformer.” arXiv preprint arXiv:2012.00364 (2020).
Chefer, Hila, Shir Gur, and Lior Wolf. “Transformer Interpretability Beyond Attention Visualization.” arXiv preprint arXiv:2012.09838 (2020).

GAN Evaluation Metric

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Objective Evaluation:

An empirical study on evaluation metrics of generative adversarial networks [1] with code.

Inception Score (IS): classification score using the InceptionNet pretrained on ImageNet
$IS=\exp\{E_x[KL(p_M(y|x)||p_M(y))]\}$
in which $p_M(y)$ is the marginal distribution of $p_M(y|x)$. Expect $p_M(y)$ to be of low entropy while $p_M(y|x)$ to be of high entropy. The higher, the better.
Mode score: extension of Inception score
Kernel MMD: MMD distance between two data distributions
Wasserstein distance: Wasserstein distance (Earth mover’s distance) between two data distributions.
Fréchet Inception Distance (FID): extract InceptionNet features and measure the data distribution distance. The lower, the better.
$FID=\|\mu_r-\mu_g\|+trace(\Sigma_r+\Sigma_g-2(\Sigma_r\Sigma_g)^{\frac{1}{2}})$
KNN score: treat true data as positive and generated data as negative. Calculate the leave-one-out (LOO) accuracy based on 1-NN classifier.
Learned Perceptual Image Patch Similarity (LPIPS): $d(x,x_0)=\sum_l \frac{1}{H_l W_l}\sum_{h,w}\|w_l\circ (\hat{y}_{hw}^l-\hat{y}^l_{0hw})\|^2$ [3] [code]

Subjective Evaluation:

Each user sees two randomly selected results at a time and is asked to choose the one that looks more realistic. After obtaining all the pairwise results, Bradley-Terry model (B-T model) is used to calculate the global ranking score for each method. [2]

Reference

Xu, Qiantong, et al. “An empirical study on evaluation metrics of generative adversarial networks.” arXiv preprint arXiv:1806.07755 (2018).
Tsai, Yi-Hsuan, et al. “Deep image harmonization.” CVPR, 2017.
Zhang, Richard, et al. “The unreasonable effectiveness of deep features as a perceptual metric.” CVPR, 2018.

Finding Datasets

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Endovascular Surgery

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Key words:

catheter, cannulation, EM tracking (enhance visualization and provide objective metric)

Endovascular Technique:

Early endovascular technique: real-time fluoroscopy and 2D angiography: ionizing radiation and repeated injection of a nephrotoxic contrast agent.

Image fusion techniques: project 3D CT and magnetic resonance imaging to real-time 2D fluoroscopic images, still require real-time fluroscopy.

Electromagnetic (EM) tracking: an EM field is generated by the Aurora Window Field Generator, and sensors on the tips measure and transmit the roll orientation and forward motion. The veracity of EM tracking is evaluated on the basis of target registration error (TRE).

System:

manual simulator 2. virtual simulator (virtual reality)

Task:

Different locations of vessels correspond to the tasks with different difficult levels.

Metric:

Certain metric is required to evaluate or augment the skills of surgeon (novice, intermediate, expert).

1.expert observation and subjective score

2.number of cases

3.kinematic metrics: mapping from kinematic data to skill (classification)

3D path length
spectral arc length: change of acceleration in frequency domain
root mean dimensionless jerk: movement smoothness
submovement number and duration
catheter turn: measure the task difficulty

View:

2D or 3D
real-time or stored (using stored image can reduce fluoroscopy time and radiation exposure)
anteroposterior/lateral/endoluminal view

Euler Angle (yaw,pitch,roll)

Posted on 2026-03-17 Edited on 2022-04-08 In paper note

Borrowing aviation terminology, these rotations will be referred to as yaw, pitch, and roll:

A yaw is a counterclockwise rotation of $ \alpha$ about the $z$-axis. The rotation matrix is given by

$\displaystyle R_z(\alpha) = \begin{pmatrix}\cos\alpha & -\sin\alpha & 0 \\\\ \sin\alpha & \cos\alpha & 0 \\\\ 0 & 0 & 1 \end{pmatrix} .$

A pitch is a counterclockwise rotation of $ \beta$ about the $ y$-axis. The rotation matrix is given by

$\displaystyle R_y(\beta) = \begin{pmatrix}\cos\beta & 0 & \sin\beta \\\\ 0 & 1 & 0 \\\\ -\sin\beta & 0 & \cos\beta \end{pmatrix} .$

A roll is a counterclockwise rotation of $ \gamma$ about the $ x$-axis. The rotation matrix is given by

$\displaystyle R_x(\gamma) = \begin{pmatrix}1 & 0 & 0 \\\\ 0 & \cos\gamma & -\sin\gamma \\\\ 0 & \sin\gamma & \cos\gamma \end{pmatrix} .$

Note that $ R(\alpha,\beta,\gamma)$ performs the roll first, then the pitch, and finally the yaw. If the order of these operations is changed, a different rotation matrix would result.

For gaze direction, roll does not change gaze direction, so only yaw and pitch affect gaze direction. Given a normalized 3D vector (x,y,z), how to determine the yaw and pitch angles?
The problem should be discussed based on the order of doing yaw/pitch.

Consider an eye rigid model (bound with a head rigid model), aligned with original coordinate system, is facing x positive direction. Since roll has no effect on eye direction, we only perform yaw and pitch. For coordinate transformation, we consider the reverse process.

The eye direction in new coordinate system is $c_1 = (1,0,0)$ but $c_2 = (x_0,y_0,z_0)$ in the original coordinate system.

If true rotation order is yaw->pitch, then $c\_2=R\_z(-\alpha)R\_y (-\beta)c\_1.$ . Then, $\beta=arsin(z_0),\alpha=-artan(y_0/x_0)$.
If true rotation order is pitch->yaw, then $c\_2=R\_y(-\beta)R\_z (-\alpha)c\_1.$ . Then, $\alpha=-arsin(y_0),\beta=artan(z_0/x_0)$.

If we insert $R_x(\gamma)$ before $c_1$, the results won’t change, which demonstrates that roll will not influence eye direction. In other words, if the true rotation order is yaw->pitch->roll or pitch->yaw->roll, the above analysis still holds.

Notice:
-

we used right-hand coordinate system, that is, thumb along the z-axis and fingers from x-axis to y-axis.
rotation $\theta$ around some axis means rotating counter clockwise $\theta$ when looking along the positive direction of that axis
when doing rotations in sequence, each rotation is based on the up-to-date coordinate system (x-axis, y-axis, z-axis).