Graph Neural Network

Posted on 2022-06-16 | In paper note

Representative methods

Tutorial slides on AAAI2019

Graph Convolutional Network (GCN) [6]
Graph Attention Network [7]
GraphSAGE(SAmple and aggreGatE) [8]
Transformer [9]: transformer can be deemed as a type of GNN.

Avoid oversmoothing and go deeper

Initial residual and Identity mapping [12]
GCN and PageRank [13]

Graph similarity/matching

A survey on graph similarity [4]

Graph transformation:

pooling/unpooling [5]

Dynamic Graph:

Pointer Graph Network [11]

Application

GNN for zero-shot learning [1][2]: treat each category as a graph node
GNN for multi-view learning [3]: treat each view as a graph node
GNN for clustering [10]

Reference:

Wang, Xiaolong, Yufei Ye, and Abhinav Gupta. “Zero-shot recognition via semantic embeddings and knowledge graphs.” CVPR, 2018.
Lee, Chung-Wei, et al. “Multi-label zero-shot learning with structured knowledge graphs.” CVPR, 2018.
Wang, Dongang, et al. “Dividing and aggregating network for multi-view action recognition.” ECCV, 2018.
Ma, Guixiang, et al. “Deep Graph Similarity Learning: A Survey.” arXiv preprint arXiv:1912.11615 (2019).
Hongyang Gao, Shuiwang Ji: Graph U-Nets. CoRR abs/1905.05178 (2019)
Kipf, Thomas N., and Max Welling. “Semi-supervised classification with graph convolutional networks.” arXiv preprint arXiv:1609.02907 (2016).
Veličković, Petar, et al. “Graph attention networks.” arXiv preprint arXiv:1710.10903 (2017).
Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.” NeurIPS, 2017.
Vaswani, Ashish, et al. “Attention is all you need.” NeurIPS, 2017.
Bo, Deyu, et al. “Structural Deep Clustering Network.” Proceedings of The Web Conference 2020. 2020.
Veličković, Petar, et al. “Pointer Graph Networks.” arXiv preprint arXiv:2006.06380 (2020).
Chen, Ming, et al. “Simple and deep graph convolutional networks.” arXiv preprint arXiv:2007.02133 (2020).
Klicpera, Johannes, Aleksandar Bojchevski, and Stephan Günnemann. “Predict then propagate: Graph neural networks meet personalized pagerank.” arXiv preprint arXiv:1810.05997 (2018).
Dosovitskiy, Alexey, et al. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020).
Carion, Nicolas, et al. “End-to-End Object Detection with Transformers.” arXiv preprint arXiv:2005.12872 (2020).
Chen, Hanting, et al. “Pre-Trained Image Processing Transformer.” arXiv preprint arXiv:2012.00364 (2020).
Chefer, Hila, Shir Gur, and Lior Wolf. “Transformer Interpretability Beyond Attention Visualization.” arXiv preprint arXiv:2012.09838 (2020).

GAN Evaluation Metric

Posted on 2022-06-16 | In paper note

Objective Evaluation:

An empirical study on evaluation metrics of generative adversarial networks [1] with code.

Inception Score (IS): classification score using the InceptionNet pretrained on ImageNet
$IS=\exp\{E_x[KL(p_M(y|x)||p_M(y))]\}$
in which $p_M(y)$ is the marginal distribution of $p_M(y|x)$. Expect $p_M(y)$ to be of low entropy while $p_M(y|x)$ to be of high entropy. The higher, the better.
Mode score: extension of Inception score
Kernel MMD: MMD distance between two data distributions
Wasserstein distance: Wasserstein distance (Earth mover’s distance) between two data distributions.
Fréchet Inception Distance (FID): extract InceptionNet features and measure the data distribution distance. The lower, the better.
$FID=\|\mu_r-\mu_g\|+trace(\Sigma_r+\Sigma_g-2(\Sigma_r\Sigma_g)^{\frac{1}{2}})$
KNN score: treat true data as positive and generated data as negative. Calculate the leave-one-out (LOO) accuracy based on 1-NN classifier.
Learned Perceptual Image Patch Similarity (LPIPS): $d(x,x_0)=\sum_l \frac{1}{H_l W_l}\sum_{h,w}\|w_l\circ (\hat{y}_{hw}^l-\hat{y}^l_{0hw})\|^2$ [3] [code]

Subjective Evaluation:

Each user sees two randomly selected results at a time and is asked to choose the one that looks more realistic. After obtaining all the pairwise results, Bradley-Terry model (B-T model) is used to calculate the global ranking score for each method. [2]

Reference

Xu, Qiantong, et al. “An empirical study on evaluation metrics of generative adversarial networks.” arXiv preprint arXiv:1806.07755 (2018).
Tsai, Yi-Hsuan, et al. “Deep image harmonization.” CVPR, 2017.
Zhang, Richard, et al. “The unreasonable effectiveness of deep features as a perceptual metric.” CVPR, 2018.

Finding Datasets

Posted on 2022-06-16 | In paper note

Euler Angle (yaw,pitch,roll)

Posted on 2022-06-16 | In paper note

Borrowing aviation terminology, these rotations will be referred to as yaw, pitch, and roll:

A yaw is a counterclockwise rotation of $ \alpha$ about the $z$-axis. The rotation matrix is given by

$\displaystyle R_z(\alpha) = \begin{pmatrix}\cos\alpha & -\sin\alpha & 0 \\\\ \sin\alpha & \cos\alpha & 0 \\\\ 0 & 0 & 1 \end{pmatrix} .$

A pitch is a counterclockwise rotation of $ \beta$ about the $ y$-axis. The rotation matrix is given by

$\displaystyle R_y(\beta) = \begin{pmatrix}\cos\beta & 0 & \sin\beta \\\\ 0 & 1 & 0 \\\\ -\sin\beta & 0 & \cos\beta \end{pmatrix} .$

A roll is a counterclockwise rotation of $ \gamma$ about the $ x$-axis. The rotation matrix is given by

$\displaystyle R_x(\gamma) = \begin{pmatrix}1 & 0 & 0 \\\\ 0 & \cos\gamma & -\sin\gamma \\\\ 0 & \sin\gamma & \cos\gamma \end{pmatrix} .$

Note that $ R(\alpha,\beta,\gamma)$ performs the roll first, then the pitch, and finally the yaw. If the order of these operations is changed, a different rotation matrix would result.

For gaze direction, roll does not change gaze direction, so only yaw and pitch affect gaze direction. Given a normalized 3D vector (x,y,z), how to determine the yaw and pitch angles?
The problem should be discussed based on the order of doing yaw/pitch.

Consider an eye rigid model (bound with a head rigid model), aligned with original coordinate system, is facing x positive direction. Since roll has no effect on eye direction, we only perform yaw and pitch. For coordinate transformation, we consider the reverse process.

The eye direction in new coordinate system is $c_1 = (1,0,0)$ but $c_2 = (x_0,y_0,z_0)$ in the original coordinate system.

If true rotation order is yaw->pitch, then $c\_2=R\_z(-\alpha)R\_y (-\beta)c\_1.$ . Then, $\beta=arsin(z_0),\alpha=-artan(y_0/x_0)$.
If true rotation order is pitch->yaw, then $c\_2=R\_y(-\beta)R\_z (-\alpha)c\_1.$ . Then, $\alpha=-arsin(y_0),\beta=artan(z_0/x_0)$.

If we insert $R_x(\gamma)$ before $c_1$, the results won’t change, which demonstrates that roll will not influence eye direction. In other words, if the true rotation order is yaw->pitch->roll or pitch->yaw->roll, the above analysis still holds.

Notice:
-

we used right-hand coordinate system, that is, thumb along the z-axis and fingers from x-axis to y-axis.
rotation $\theta$ around some axis means rotating counter clockwise $\theta$ when looking along the positive direction of that axis
when doing rotations in sequence, each rotation is based on the up-to-date coordinate system (x-axis, y-axis, z-axis).

Endovascular Surgery

Posted on 2022-06-16 | In paper note

Key words:

catheter, cannulation, EM tracking (enhance visualization and provide objective metric)

Endovascular Technique:

Early endovascular technique: real-time fluoroscopy and 2D angiography: ionizing radiation and repeated injection of a nephrotoxic contrast agent.

Image fusion techniques: project 3D CT and magnetic resonance imaging to real-time 2D fluoroscopic images, still require real-time fluroscopy.

Electromagnetic (EM) tracking: an EM field is generated by the Aurora Window Field Generator, and sensors on the tips measure and transmit the roll orientation and forward motion. The veracity of EM tracking is evaluated on the basis of target registration error (TRE).

System:

manual simulator 2. virtual simulator (virtual reality)

Task:

Different locations of vessels correspond to the tasks with different difficult levels.

Metric:

Certain metric is required to evaluate or augment the skills of surgeon (novice, intermediate, expert).

1.expert observation and subjective score

2.number of cases

3.kinematic metrics: mapping from kinematic data to skill (classification)

3D path length
spectral arc length: change of acceleration in frequency domain
root mean dimensionless jerk: movement smoothness
submovement number and duration
catheter turn: measure the task difficulty

View:

2D or 3D
real-time or stored (using stored image can reduce fluoroscopy time and radiation exposure)
anteroposterior/lateral/endoluminal view

Distillation

Posted on 2022-06-16 | In paper note

knowledge/model distillation [1]
data distillation [2] [4] [5]

A survey of knowledge distillation [3]

Reference

[1] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015).

[2] Radosavovic, Ilija, et al. “Data distillation: Towards omni-supervised learning.” CVPR, 2018.

[3] Wang, Lin, and Kuk-Jin Yoon. “Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks.” arXiv preprint arXiv:2004.05937 (2020).

[4] Nguyen, Timothy, Zhourong Chen, and Jaehoon Lee. “Dataset Meta-Learning from Kernel Ridge-Regression.” arXiv preprint arXiv:2011.00050 (2020).

[5] Nguyen, Timothy, et al. “Dataset distillation with infinitely wide convolutional networks.” Advances in Neural Information Processing Systems 34 (2021).

Data Augmentation

Posted on 2022-06-16 | In paper note

Traditional data augmentation
- color, hue, illumination
- flip, crop, shear, rotation, (piecewise) affine transformation, Cutout, RandErasing, HideAndSeek, GridMask
Mixtures: Mixup [1], CutMix [2] (Mixture in spatial domain), GridMask [6], FMix [3] (Mixture in frequency)
Learn optimal data augmentation strategy: [4] [5], AutoAugment, RandAugment, Fast AutoAugment, Faster AutoAugment, Greedy Augment.
Semantic augmentation: [7]

A summary of existing data augmentation methods [link]

Reference

[1] mixup: Beyond empirical risk minimization

[2] Cutmix: Regularization strategy to train strong classifiers with localizable features

[3] Understanding and Enhancing Mixed Sample Data Augmentation

[4] AutoAugment: Learning Augmentation Strategies from Data

[5] The Effectiveness of Data Augmentation in Image Classification using Deep Learning

[6] GridMask Data Augmentation

[7] Regularizing Deep Networks with Semantic Data Augmentation

Clothes Dataset

Posted on 2022-06-16 | In paper note

deepfashion: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html (attribute, bounding box, landmark)
Colorful-Fashion: https://sites.google.com/site/fashionparsing/home (pixel-level color-category label)
CCP (Clothing Co-Parsing): https://github.com/bearpaw/clothing-co-parsing (parsing label)
fashionistas: http://vision.is.tohoku.ac.jp/~kyamagu/research/clothing_parsing/(parsing label)
HPW (Human Parsing in the Wild): https://github.com/lemondan/HumanParsing-Dataset (parsing label)
modaNet: https://github.com/eBay/modanet (polygon annotations)

Class Imbalance

Posted on 2022-06-16 | In paper note

re-sampling
synthetic samples: generate more samples for minor classes
re-weighting
few-shot learning
decoupling representation and classifier learning: use normal sampling in the feature learning stage and use re-sampling in the classifier learning stage.

Causal Inference

Posted on 2022-06-16 | In paper note

Big Names: Judy Pearl [Tutorial] [slides] [textbook], James Robin [Textbook] [slides]

Tutorial:

Causality for machine learning [4]
Towards Causal Representation Learning [8]
A briefing on causal inference written by myself

Workshop: NIPS2018 workshop on causal learning, KDD2020 Tutorial on Causal Inference Meets Machine Learning

Material: MILA Course

Causality and disentanglement: [5] [6]

Counterfactual and disentanglement: [7]

Reference

[1] Chalupka K, Perona P, Eberhardt F. Visual causal feature learning. arXiv preprint arXiv:1412.2309, 2014.

[2] Lopez-Paz D, Nishihara R, Chintala S, et al. Discovering causal signals in images. CVPR, 2017.

[3] Bau D, Zhu J Y, Strobelt H, et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arXiv preprint arXiv:1811.10597, 2018.

[4] Bernhard Schölkopf: CAUSALITY FOR MACHINE LEARNING. arXiv preprint arXiv:1911.10500, 2019.

[5] Kim, Hyemi, et al. “Counterfactual Fairness with Disentangled Causal Effect Variational Autoencoder.” arXiv preprint arXiv:2011.11878 (2020).

[6] Shen, Xinwei, et al. “Disentangled Generative Causal Representation Learning.” arXiv preprint arXiv:2010.02637 (2020).

[7] Yue, Zhongqi, et al. “Counterfactual Zero-Shot and Open-Set Visual Recognition.” arXiv preprint arXiv:2103.00887 (2021).

[8] Schölkopf, Bernhard, et al. “Towards causal representation learning.” arXiv preprint arXiv:2102.11107 (2021).