Dynamic Kernel

Posted on 2022-09-19 | In paper note

Dynamic kernels: [1] [2]

Survey: [Dynamic neural networks: A survey]

References

Jia, Xu, et al. “Dynamic filter networks.” Advances in neural information processing systems 29 (2016).
Tian, Zhi, Chunhua Shen, and Hao Chen. “Conditional convolutions for instance segmentation.” European conference on computer vision. Springer, Cham, 2020.

Virtual Try-on

Posted on 2022-09-10 | In paper note

warping

correspondence matrix [1] [4]
TPS [1] [3]
offset and weight [2]

target person

Garment Transfer: [5] [6] [8]

Controllable person image synthesis: [7]

Recurrent Person Image Generation: [9]

References

Yang, Fan, and Guosheng Lin. “CT-Net: Complementary Transfering Network for Garment Transfer with Arbitrary Geometric Changes.” CVPR, 2021.
Bai, Shuai, et al. “Single Stage Virtual Try-on via Deformable Attention Flows.” arXiv preprint arXiv:2207.09161 (2022).
Fenocchi, Emanuele, et al. “Dual-Branch Collaborative Transformer for Virtual Try-On.” CVPR, 2022.
Morelli, Davide, et al. “Dress Code: High-Resolution Multi-Category Virtual Try-On.” CVPR, 2022.
Fan Yang, Guosheng Lin. “CT-Net: Complementary Transfering Network for Garment Transfer with Arbitrary Geometric Changes.” CVPR, 2021.
Liu, Ting, et al. “Spatial-aware texture transformer for high-fidelity garment transfer.” IEEE Transactions on Image Processing 30 (2021): 7499-7510.
Zhou, Xinyue, et al. “Cross Attention Based Style Distribution for Controllable Person Image Synthesis.” arXiv preprint arXiv:2208.00712 (2022).
Raj, Amit, et al. “Swapnet: Image based garment transfer.” European Conference on Computer Vision. Springer, Cham, 2018.
Cui, Aiyu, Daniel McKee, and Svetlana Lazebnik. “Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing.” ICCV, 2021.

Diffusion Model

Posted on 2022-09-09 | In paper note

class-conditioned image generation: [1]
Image-to-image translation: [4], [7], [6], [8]
Image-to-image translation with guidance: GLIDE[2](global: text), [20](global: text, sketch), [21](local: text), [22](local: text, image), ControlNet[23](global: mixture), T2I-Adapter[24](global: mixture), [25](local: mixture), [26](global: text), [27], Ctrl-Adapter [36]
unpaired Image-to-image translation: [19] [28] [29]
Image composition: SDEdit [17], ILVR [6], [5], [9], [15]
Image inpainting: [10], [11], [12], [13]
Predict mask: [31] cross-attention and post processing; [32] add one output channel; [33] predict masks using the feature maps in early steps.

MileStone: DDPM [3], Stable diffusion v1, v2, XL, v3

Acceleration: DDIM [14], PLMS [16]

High-resolution: [34] progressive training

Light-weight: [35]

Failure case analyses: [30]

Surveys

Tutorial materials: [a] [b]

References

[1] Dhariwal, Prafulla, and Alex Nichol. “Diffusion models beat gans on image synthesis.” arXiv preprint arXiv:2105.05233 (2021).

[2] Nichol, Alex, et al. “Glide: Towards photorealistic image generation and editing with text-guided diffusion models.” arXiv preprint arXiv:2112.10741 (2021).

[3] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models.” Advances in Neural Information Processing Systems 33 (2020): 6840-6851.

[4] Wang, Tengfei, et al. “Pretraining is All You Need for Image-to-Image Translation.” arXiv preprint arXiv:2205.12952 (2022).

[5] Hachnochi, Roy, et al. “Cross-domain Compositing with Pretrained Diffusion Models.” arXiv preprint arXiv:2302.10167 (2023).

[6] Choi, Jooyoung, et al. “ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models.” ICCV, 2021.

[7] Kwon, Gihyun, and Jong Chul Ye. “Diffusion-based image translation using disentangled style and content representation.” ICLR, 2023.

[8] Meng, Chenlin, et al. “Sdedit: Guided image synthesis and editing with stochastic differential equations.” ICLR, 2021.

[9] Yang, Binxin, et al. “Paint by Example: Exemplar-based Image Editing with Diffusion Models.” arXiv preprint arXiv:2211.13227 (2022).

[10] Lugmayr, Andreas, et al. “Repaint: Inpainting using denoising diffusion probabilistic models.” CVPR, 2022.

[11] Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” CVPR, 2022.

[12] Li, Wenbo, et al. “SDM: Spatial Diffusion Model for Large Hole Image Inpainting.” arXiv preprint arXiv:2212.02963 (2022).

[13] Wang, Su, et al. “Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting.” arXiv preprint arXiv:2212.06909 (2022).

[14] Song, Jiaming, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models.” arXiv preprint arXiv:2010.02502 (2020).

[15] Song, Yizhi, et al. “ObjectStitch: Generative Object Compositing.” CVPR, 2023.

[16] Liu, Luping, et al. “Pseudo numerical methods for diffusion models on manifolds.” ICLR, (2022).

[17] Meng, Chenlin, et al. “Sdedit: Guided image synthesis and editing with stochastic differential equations.” ICLR, 2021.

[19] Kwon, Gihyun, and Jong Chul Ye. “Diffusion-based image translation using disentangled style and content representation.” ILCR, 2023.

[20] Voynov, Andrey, Kfir Aberman, and Daniel Cohen-Or. “Sketch-Guided Text-to-Image Diffusion Models.” arXiv preprint arXiv:2211.13752 (2022).

[21] Yang, Zhengyuan, et al. “ReCo: Region-Controlled Text-to-Image Generation.” arXiv preprint arXiv:2211.15518 (2022).

[22] Li, Yuheng, et al. “GLIGEN: Open-Set Grounded Text-to-Image Generation.” arXiv preprint arXiv:2301.07093 (2023).

[23] Zhang, Lvmin, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models.” arXiv preprint arXiv:2302.05543 (2023).

[24] Mou, Chong, et al. “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models.” arXiv preprint arXiv:2302.08453 (2023).

[25] Huang, Lianghua, et al. “Composer: Creative and controllable image synthesis with composable conditions.” arXiv preprint arXiv:2302.09778 (2023).

[26] Wei, Yuxiang, et al. “Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation.” arXiv preprint arXiv:2302.13848 (2023).

[27] Zhao, Shihao, et al. “Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models.” arXiv preprint arXiv:2305.16322 (2023).

[28] Sasaki, Hiroshi, Chris G. Willcocks, and Toby P. Breckon. “Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models.” arXiv preprint arXiv:2104.05358 (2021).

[29] Su, Xuan, et al. “Dual diffusion implicit bridges for image-to-image translation.” arXiv preprint arXiv:2203.08382 (2022).

[30] Chengbin Du, Yanxi Li, Zhongwei Qiu, Chang Xu, “Stable Diffusion is Unstable”.

[31] Wu, Weijia, et al. “Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models.” arXiv preprint arXiv:2303.11681 (2023).

[32] Xie, Shaoan, et al. “Smartbrush: Text and shape guided object inpainting with diffusion model.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[33] Ma, Jian, et al. “GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently.” arXiv preprint arXiv:2303.17870 (2023).

[34] Gu, Jiatao, et al. “Matryoshka Diffusion Models.” arXiv preprint arXiv:2310.15111 (2023).

[35] Li, Yanyu, et al. “SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds.” NeurIPS(2023).

[36] Lin, Han, et al. “Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model.” arXiv preprint arXiv:2404.09967 (2024).

Exemplar-guided Image Translation

Posted on 2022-09-09 | In paper note

Task: Each exemplar represents one domain. Transfer the style of exemplar image to the input image.

[1]

reconstruct the style code: [2]
use pretrained network (prior knowledge) to extract the style code: [3]

Reference

Zhang, Pan, et al. “Cross-domain correspondence learning for exemplar-based image translation.” CVPR, 2020.
Anokhin, Ivan, et al. “High-resolution daytime translation without domain labels.” CVPR, 2020.
Tumanyan, Narek, et al. “Splicing ViT Features for Semantic Appearance Transfer.” CVPR, 2022.

NERF

Posted on 2022-08-22 | In paper note

NERF [1]
GIRAFFE [2]

Reference

[1] Mildenhall, Ben, et al. “Nerf: Representing scenes as neural radiance fields for view synthesis.” ECCV, 2020.

[2] Niemeyer, Michael, and Andreas Geiger. “Giraffe: Representing scenes as compositional generative neural feature fields.” CVPR, 2021.

Mask Form

Posted on 2022-07-25 | In paper note

binary map
frequency: DCT [1]
PolarMask [2]
Hyperbolic [3]

Reference

[1] Shen, Xing, et al. “Dct-mask: Discrete cosine transform mask representation for instance segmentation.” CVPR, 2021.

[2] Xie, Enze, et al. “Polarmask: Single shot instance segmentation with polar representation.” CVPR, 2020.

[3] GhadimiAtigh, Mina, et al. “Hyperbolic Image Segmentation.” arXiv preprint arXiv:2203.05898 (2022).

To a Beginner on Paper Writing

Posted on 2022-07-22 | In paper note

Carefully read the following instructions. These are the key points you should pay attention to when writing papers.

The commonly used words in academic papers are summarized in https://ustcnewly.github.io/2022/06/16/others/Dictionary%20for%20Paper%20Writing/.
Before writing your own paper, carefully read 10 closely related papers and record the materials (words/phrases/sentences) which could be used in your paper. Organize your collected materials and think about when to use them. Do not copy them word-by-word, you need to incorporate them into your own paper coherently and seamlessly.

Network Architecture

Posted on 2022-07-15 | In paper note

Transformer
Large kernel: [1] [2] [3]

Reference

[1] Liu, Zhuang, et al. “A convnet for the 2020s.” CVPR, 2022.

[2] Ding, Xiaohan, et al. “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns.” CVPR, 2022.

[3] More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity

Install VirtualBox Guest Addition

Posted on 2022-06-16 | In software

Before install guest addition from CD, do the following

sudo apt-get install dkms build-essential linux-headers-generic linux-headers-$(uname -r)

For missing linux kernel headers or other common problems, refer to this.

use uname -r or uname -a to look up the kernel version, use dpkg --get-selections | grep linux to check the installed linux kernels.
If you click the sharefolder item in the menubar and get the follwing error: ‘The VirtualBox Guest Additions do not seem to be available on this virtual machine, and shared folders cannot be used without them’, the following commands may be helpful.
1
2
3
4
sudo apt-get install virtualbox-guest-additions-iso
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install virtualbox-guest-x11

Enlarge VirtualBox vdi

Posted on 2022-06-16 | In software

Go to virtualbox installation directory and execute the following command:

D:\Program Files\Oracle\VirtualBox\VBoxManage.exe modifyhd "F:\VirtualBox\my ubuntu.vdi" --resize 15360

Note 15360 is the new size (M), this command can only enlarge the size.
Install gparted by sudo apt-get install gparted and make the extended disk space available to use.
Remount /home to the new disk. For concrete steps, refer to this link.