Non-local Network

Posted on 2022-06-16 | In paper note

Extensions of non-local network [1]: [2] [3] [4] [5]

Reference

[1] Wang, Xiaolong, et al. “Non-local neural networks.” CVPR, 2018.

[2] Zhu, Zhen, et al. “Asymmetric non-local neural networks for semantic segmentation.” ICCV, 2019.

[3] Li, Xia, et al. “Expectation-maximization attention networks for semantic segmentation.” ICCV, 2019.

[4] Huang, Zilong, et al. “Ccnet: Criss-cross attention for semantic segmentation.” ICCV, 2019.

[5] Zhang, Li, et al. “Dynamic graph message passing networks.” CVPR, 2020.

Multi-modality Fusion

Posted on 2022-06-16 | In paper note

Concatenation/summation, or weighted (attention mechanism) concatenation/summation.
P(y|x1,x2)=P(y|x1)P(y|x2), with Gaussian distribution assumption [1]

Reference

Huang, Xun, et al. “Multimodal Conditional Image Synthesis with Product-of-Experts GANs.” arXiv preprint arXiv:2112.05130 (2021).

Multi-modal Problem

Posted on 2022-06-16 | In paper note

Multi-modal problem means that given an input, there exist multiple possible outputs instead of a single deterministic output. The key problem is the mode collapse problem.

The ground-truth output belongs to one of K generated possibilities [1]. K is set beforehand.
Ensure bijection between random vector and output: Associate random factor (e.g., random vector z) with specific information [2]. Either random factor is conditioned on specific information, or the generated output can recognize random factor. If the mapping from random vector to output is invertible (e.g., glow), there is a natural bijection between random vector and output [6].
Enforce different random vectors to produce different outputs: push apart the outputs generated from different random vectors z with diversity loss or mode seeking loss [3] [4] [5]

Reference

[1] Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating visual representations from unlabeled video.” CVPR, 2016.

[2] Zhu, Jun-Yan, et al. “Toward multimodal image-to-image translation.” Advances in Neural Information Processing Systems. 2017.

[3] Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming-Hsuan Yang. Mode seeking generative adversarial networks for diverse image synthesis. In CVPR, 2019.

[4] Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, and Honglak Lee. Diversity-sensitive
conditional generative adversarial networks. arXiv preprint arXiv:1901.09024, 2019.

[5] Shaohui Liu, Xiao Zhang, Jianqiao Wangni, Jianbo Shi: Normalized Diversification. CVPR 2019: 10306-10315

[6] Lugmayr, Andreas, et al. “SRFlow: Learning the Super-Resolution Space with Normalizing Flow.” arXiv preprint arXiv:2006.14200 (2020).

MLP

Posted on 2022-06-16 | In paper note

classification [1] [2]
detection, segmentation [3]

Reference

[1] Tolstikhin, Ilya O., et al. “Mlp-mixer: An all-mlp architecture for vision.” Advances in Neural Information Processing Systems 34 (2021).

[2] Melas-Kyriazi, Luke. “Do you even need attention? a stack of feed-forward layers does surprisingly well on imagenet.” arXiv preprint arXiv:2105.02723 (2021).

[3] Lian, Dongze, et al. “As-mlp: An axial shifted mlp architecture for vision.” arXiv preprint arXiv:2107.08391 (2021).

Memory Network

Posted on 2022-06-16 | In paper note

First paper of memory network: I(input feature map), G(generalization), O(output feature map), R(response), use the following objective function to optimize the variables in I,G,O,R.
end-to-end memory network: easy back-propagation

semi-supervised learning with memory module [1]
few-shot learning with memory module [1] [2] [3]
global memory [1]
short-term memory and long-term memory [1]

Mask-aided Object Detection

Posted on 2022-06-16 | In paper note

Help generate proposal:

Combine semantic mask with feature map (e.g., concatenation, summation) to help predict bounding boxes: [1] [2] [3]
Generate proposals from semantic mask: [4]

Help select proposal:

Assign weights to proposals based on semantic mask: [5]
Use semantic mask surrounding each proposal as auxilary feature: [6]

Reference

[1] Yan Liu, Zhijie Zhang, Li Niu, Junjie Chen, Liqing Zhang, “Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity”, NeurIPS, 2021.

[2] Zitian Chen, Zhiqiang Shen, Jiahui Yu, Erik Learned-Miller: “Cross-Supervised Object Detection.” arXiv preprint arXiv:2006.15056 (2020)

[3] Zhao, Xiangyun, Shuang Liang, and Yichen Wei. “Pseudo mask augmented object detection.” CVPR, 2018.

[4] Diba, Ali, et al. “Weakly supervised cascaded convolutional networks.” CVPR, 2017.

[5] Li, Xiaoyan, et al. “Weakly supervised object detection with segmentation collaboration.” ICCV, 2019.

[6] Wei, Yunchao, et al. “Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection.” ECCV, 2018.

Layout Generation

Posted on 2022-06-16 | In paper note

VAE/GAN: [1] [6] [7](hierarchical encoder/decoder)
GNN: [2] [5]
autoregressive: [3] [4]

Reference

Zheng, Xinru, et al. “Content-aware generative modeling of graphic design layouts.” ACM Transactions on Graphics (TOG) 38.4 (2019): 1-15.
Lee, Hsin-Ying, et al. “Neural design network: Graphic layout generation with constraints.” ECCV, 2020.
Gupta, Kamal, et al. “Layout Generation and Completion with Self-attention.” arXiv preprint arXiv:2006.14615 (2020).
Jyothi, Akash Abdu, et al. “Layoutvae: Stochastic scene layout generation from a label set.” ICCV, 2019.
Li, Jianan, et al. “Layoutgan: Generating graphic layouts with wireframe discriminators.” ICLR, 2019.
Arroyo, Diego Martin, Janis Postels, and Federico Tombari. “Variational Transformer Networks for Layout Generation.” CVPR, 2021.
Patil, Akshay Gadi, et al. “Read: Recursive autoencoders for document layout generation.” CVPR Workshops. 2020.

Jigsaw Puzzle

Posted on 2022-06-16 | In paper note

Reorganize patches [1]
Reorganize pixels [2]

Reference

[1] Noroozi, Mehdi, and Paolo Favaro. “Unsupervised learning of visual representations by solving jigsaw puzzles.” ECCV, 2016.

[2] Shen, Wan Xiang, et al. “AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks.” Nucleic Acids Research (2022).

Interpretable Machine Learning

Posted on 2022-06-16 | In paper note

Manipulate each layer/neuron, and observe the change of network parameters/activations.
Saliency map
Adversarial attack
Correlation
Information gain/loss

Instance Image-to-Image Translation

Posted on 2022-06-16 | In paper note

Translate one or multiple instances in an image: [1]

Reference

[1] Mo, Sangwoo, Minsu Cho, and Jinwoo Shin. “Instagan: Instance-aware image-to-image translation.” arXiv preprint arXiv:1812.10889 (2018).