Newly Blog


  • Home

  • Tags

  • Categories

  • Archives

  • Search

Non-local Network

Posted on 2022-06-16 | In paper note

Extensions of non-local network [1]: [2] [3] [4] [5]

Reference

[1] Wang, Xiaolong, et al. “Non-local neural networks.” CVPR, 2018.

[2] Zhu, Zhen, et al. “Asymmetric non-local neural networks for semantic segmentation.” ICCV, 2019.

[3] Li, Xia, et al. “Expectation-maximization attention networks for semantic segmentation.” ICCV, 2019.

[4] Huang, Zilong, et al. “Ccnet: Criss-cross attention for semantic segmentation.” ICCV, 2019.

[5] Zhang, Li, et al. “Dynamic graph message passing networks.” CVPR, 2020.

Multi-modality Fusion

Posted on 2022-06-16 | In paper note
  1. Concatenation/summation, or weighted (attention mechanism) concatenation/summation.
  2. P(y|x1,x2)=P(y|x1)P(y|x2), with Gaussian distribution assumption [1]

Reference

  1. Huang, Xun, et al. “Multimodal Conditional Image Synthesis with Product-of-Experts GANs.” arXiv preprint arXiv:2112.05130 (2021).

Multi-modal Problem

Posted on 2022-06-16 | In paper note

Multi-modal problem means that given an input, there exist multiple possible outputs instead of a single deterministic output. The key problem is the mode collapse problem.

  1. The ground-truth output belongs to one of K generated possibilities [1]. K is set beforehand.

  2. Ensure bijection between random vector and output: Associate random factor (e.g., random vector z) with specific information [2]. Either random factor is conditioned on specific information, or the generated output can recognize random factor. If the mapping from random vector to output is invertible (e.g., glow), there is a natural bijection between random vector and output [6].

  3. Enforce different random vectors to produce different outputs: push apart the outputs generated from different random vectors z with diversity loss or mode seeking loss [3] [4] [5]

Reference

[1] Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating visual representations from unlabeled video.” CVPR, 2016.

[2] Zhu, Jun-Yan, et al. “Toward multimodal image-to-image translation.” Advances in Neural Information Processing Systems. 2017.

[3] Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming-Hsuan Yang. Mode seeking generative adversarial networks for diverse image synthesis. In CVPR, 2019.

[4] Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, and Honglak Lee. Diversity-sensitive
conditional generative adversarial networks. arXiv preprint arXiv:1901.09024, 2019.

[5] Shaohui Liu, Xiao Zhang, Jianqiao Wangni, Jianbo Shi: Normalized Diversification. CVPR 2019: 10306-10315

[6] Lugmayr, Andreas, et al. “SRFlow: Learning the Super-Resolution Space with Normalizing Flow.” arXiv preprint arXiv:2006.14200 (2020).

MLP

Posted on 2022-06-16 | In paper note
  • classification [1] [2]
  • detection, segmentation [3]

Reference

[1] Tolstikhin, Ilya O., et al. “Mlp-mixer: An all-mlp architecture for vision.” Advances in Neural Information Processing Systems 34 (2021).

[2] Melas-Kyriazi, Luke. “Do you even need attention? a stack of feed-forward layers does surprisingly well on imagenet.” arXiv preprint arXiv:2105.02723 (2021).

[3] Lian, Dongze, et al. “As-mlp: An axial shifted mlp architecture for vision.” arXiv preprint arXiv:2107.08391 (2021).

Memory Network

Posted on 2022-06-16 | In paper note
  • First paper of memory network: I(input feature map), G(generalization), O(output feature map), R(response), use the following objective function to optimize the variables in I,G,O,R.

  • end-to-end memory network: easy back-propagation

  • semi-supervised learning with memory module [1]

  • few-shot learning with memory module [1] [2] [3]

  • global memory [1]

  • short-term memory and long-term memory [1]

Mask-aided Object Detection

Posted on 2022-06-16 | In paper note

Help generate proposal:

Help generate proposal:

  1. Combine semantic mask with feature map (e.g., concatenation, summation) to help predict bounding boxes: [1] [2] [3]

  2. Generate proposals from semantic mask: [4]

Help select proposal:

  1. Assign weights to proposals based on semantic mask: [5]

  2. Use semantic mask surrounding each proposal as auxilary feature: [6]

Reference

[1] Yan Liu, Zhijie Zhang, Li Niu, Junjie Chen, Liqing Zhang, “Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity”, NeurIPS, 2021.

[2] Zitian Chen, Zhiqiang Shen, Jiahui Yu, Erik Learned-Miller: “Cross-Supervised Object Detection.” arXiv preprint arXiv:2006.15056 (2020)

[3] Zhao, Xiangyun, Shuang Liang, and Yichen Wei. “Pseudo mask augmented object detection.” CVPR, 2018.

[4] Diba, Ali, et al. “Weakly supervised cascaded convolutional networks.” CVPR, 2017.

[5] Li, Xiaoyan, et al. “Weakly supervised object detection with segmentation collaboration.” ICCV, 2019.

[6] Wei, Yunchao, et al. “Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection.” ECCV, 2018.

Layout Generation

Posted on 2022-06-16 | In paper note
  1. VAE/GAN: [1] [6] [7](hierarchical encoder/decoder)
  2. GNN: [2] [5]
  3. autoregressive: [3] [4]

Reference

  1. Zheng, Xinru, et al. “Content-aware generative modeling of graphic design layouts.” ACM Transactions on Graphics (TOG) 38.4 (2019): 1-15.

  2. Lee, Hsin-Ying, et al. “Neural design network: Graphic layout generation with constraints.” ECCV, 2020.

  3. Gupta, Kamal, et al. “Layout Generation and Completion with Self-attention.” arXiv preprint arXiv:2006.14615 (2020).

  4. Jyothi, Akash Abdu, et al. “Layoutvae: Stochastic scene layout generation from a label set.” ICCV, 2019.

  5. Li, Jianan, et al. “Layoutgan: Generating graphic layouts with wireframe discriminators.” ICLR, 2019.

  6. Arroyo, Diego Martin, Janis Postels, and Federico Tombari. “Variational Transformer Networks for Layout Generation.” CVPR, 2021.

  7. Patil, Akshay Gadi, et al. “Read: Recursive autoencoders for document layout generation.” CVPR Workshops. 2020.

Jigsaw Puzzle

Posted on 2022-06-16 | In paper note
  1. Reorganize patches [1]

  2. Reorganize pixels [2]

Reference

[1] Noroozi, Mehdi, and Paolo Favaro. “Unsupervised learning of visual representations by solving jigsaw puzzles.” ECCV, 2016.

[2] Shen, Wan Xiang, et al. “AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks.” Nucleic Acids Research (2022).

Interpretable Machine Learning

Posted on 2022-06-16 | In paper note
  1. Manipulate each layer/neuron, and observe the change of network parameters/activations.

  2. Saliency map

  3. Adversarial attack

  4. Correlation

  5. Information gain/loss

Instance Image-to-Image Translation

Posted on 2022-06-16 | In paper note

Translate one or multiple instances in an image: [1]

Reference

[1] Mo, Sangwoo, Minsu Cho, and Jinwoo Shin. “Instagan: Instance-aware image-to-image translation.” arXiv preprint arXiv:1812.10889 (2018).

1…171819…24
Li Niu

Li Niu

237 posts
18 categories
112 tags
Homepage GitHub Linkedin
© 2025 Li Niu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4