Newly Blog


  • Home

  • Tags

  • Categories

  • Archives

  • Search

Dynamic Kernel

Posted on 2022-09-19 | In paper note

Dynamic kernels: [1] [2]

Survey: [Dynamic neural networks: A survey]

References

  1. Jia, Xu, et al. “Dynamic filter networks.” Advances in neural information processing systems 29 (2016).

  2. Tian, Zhi, Chunhua Shen, and Hao Chen. “Conditional convolutions for instance segmentation.” European conference on computer vision. Springer, Cham, 2020.

Virtual Try-on

Posted on 2022-09-10 | In paper note

warping

  • correspondence matrix [1] [4]
  • TPS [1] [3]
  • offset and weight [2]

target person

Garment Transfer: [5] [6] [8]

Controllable person image synthesis: [7]

Recurrent Person Image Generation: [9]

References

  1. Yang, Fan, and Guosheng Lin. “CT-Net: Complementary Transfering Network for Garment Transfer with Arbitrary Geometric Changes.” CVPR, 2021.

  2. Bai, Shuai, et al. “Single Stage Virtual Try-on via Deformable Attention Flows.” arXiv preprint arXiv:2207.09161 (2022).

  3. Fenocchi, Emanuele, et al. “Dual-Branch Collaborative Transformer for Virtual Try-On.” CVPR, 2022.

  4. Morelli, Davide, et al. “Dress Code: High-Resolution Multi-Category Virtual Try-On.” CVPR, 2022.

  5. Fan Yang, Guosheng Lin. “CT-Net: Complementary Transfering Network for Garment Transfer with Arbitrary Geometric Changes.” CVPR, 2021.

  6. Liu, Ting, et al. “Spatial-aware texture transformer for high-fidelity garment transfer.” IEEE Transactions on Image Processing 30 (2021): 7499-7510.

  7. Zhou, Xinyue, et al. “Cross Attention Based Style Distribution for Controllable Person Image Synthesis.” arXiv preprint arXiv:2208.00712 (2022).

  8. Raj, Amit, et al. “Swapnet: Image based garment transfer.” European Conference on Computer Vision. Springer, Cham, 2018.

  9. Cui, Aiyu, Daniel McKee, and Svetlana Lazebnik. “Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing.” ICCV, 2021.

Diffusion Model

Posted on 2022-09-09 | In paper note
  • class-conditioned image generation: [1]

  • Image-to-image translation: [4], [7], [6], [8]

  • Image-to-image translation with guidance: GLIDE[2](global: text), [20](global: text, sketch), [21](local: text), [22](local: text, image), ControlNet[23](global: mixture), T2I-Adapter[24](global: mixture), [25](local: mixture), [26](global: text), [27], Ctrl-Adapter [36]

  • unpaired Image-to-image translation: [19] [28] [29]

  • Image composition: SDEdit [17], ILVR [6], [5], [9], [15]

  • Image inpainting: [10], [11], [12], [13]

  • Predict mask: [31] cross-attention and post processing; [32] add one output channel; [33] predict masks using the feature maps in early steps.

MileStone: DDPM [3], Stable diffusion v1, v2, XL, v3

Acceleration: DDIM [14], PLMS [16]

High-resolution: [34] progressive training

Light-weight: [35]

Failure case analyses: [30]

Surveys

  • Diffusion Models: A Comprehensive Survey of Methods and Applications
  • Diffusion Models in Vision: A Survey

Tutorial materials: [a] [b]

References

[1] Dhariwal, Prafulla, and Alex Nichol. “Diffusion models beat gans on image synthesis.” arXiv preprint arXiv:2105.05233 (2021).

[2] Nichol, Alex, et al. “Glide: Towards photorealistic image generation and editing with text-guided diffusion models.” arXiv preprint arXiv:2112.10741 (2021).

[3] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models.” Advances in Neural Information Processing Systems 33 (2020): 6840-6851.

[4] Wang, Tengfei, et al. “Pretraining is All You Need for Image-to-Image Translation.” arXiv preprint arXiv:2205.12952 (2022).

[5] Hachnochi, Roy, et al. “Cross-domain Compositing with Pretrained Diffusion Models.” arXiv preprint arXiv:2302.10167 (2023).

[6] Choi, Jooyoung, et al. “ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models.” ICCV, 2021.

[7] Kwon, Gihyun, and Jong Chul Ye. “Diffusion-based image translation using disentangled style and content representation.” ICLR, 2023.

[8] Meng, Chenlin, et al. “Sdedit: Guided image synthesis and editing with stochastic differential equations.” ICLR, 2021.

[9] Yang, Binxin, et al. “Paint by Example: Exemplar-based Image Editing with Diffusion Models.” arXiv preprint arXiv:2211.13227 (2022).

[10] Lugmayr, Andreas, et al. “Repaint: Inpainting using denoising diffusion probabilistic models.” CVPR, 2022.

[11] Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” CVPR, 2022.

[12] Li, Wenbo, et al. “SDM: Spatial Diffusion Model for Large Hole Image Inpainting.” arXiv preprint arXiv:2212.02963 (2022).

[13] Wang, Su, et al. “Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting.” arXiv preprint arXiv:2212.06909 (2022).

[14] Song, Jiaming, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models.” arXiv preprint arXiv:2010.02502 (2020).

[15] Song, Yizhi, et al. “ObjectStitch: Generative Object Compositing.” CVPR, 2023.

[16] Liu, Luping, et al. “Pseudo numerical methods for diffusion models on manifolds.” ICLR, (2022).

[17] Meng, Chenlin, et al. “Sdedit: Guided image synthesis and editing with stochastic differential equations.” ICLR, 2021.

[19] Kwon, Gihyun, and Jong Chul Ye. “Diffusion-based image translation using disentangled style and content representation.” ILCR, 2023.

[20] Voynov, Andrey, Kfir Aberman, and Daniel Cohen-Or. “Sketch-Guided Text-to-Image Diffusion Models.” arXiv preprint arXiv:2211.13752 (2022).

[21] Yang, Zhengyuan, et al. “ReCo: Region-Controlled Text-to-Image Generation.” arXiv preprint arXiv:2211.15518 (2022).

[22] Li, Yuheng, et al. “GLIGEN: Open-Set Grounded Text-to-Image Generation.” arXiv preprint arXiv:2301.07093 (2023).

[23] Zhang, Lvmin, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models.” arXiv preprint arXiv:2302.05543 (2023).

[24] Mou, Chong, et al. “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models.” arXiv preprint arXiv:2302.08453 (2023).

[25] Huang, Lianghua, et al. “Composer: Creative and controllable image synthesis with composable conditions.” arXiv preprint arXiv:2302.09778 (2023).

[26] Wei, Yuxiang, et al. “Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation.” arXiv preprint arXiv:2302.13848 (2023).

[27] Zhao, Shihao, et al. “Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models.” arXiv preprint arXiv:2305.16322 (2023).

[28] Sasaki, Hiroshi, Chris G. Willcocks, and Toby P. Breckon. “Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models.” arXiv preprint arXiv:2104.05358 (2021).

[29] Su, Xuan, et al. “Dual diffusion implicit bridges for image-to-image translation.” arXiv preprint arXiv:2203.08382 (2022).

[30] Chengbin Du, Yanxi Li, Zhongwei Qiu, Chang Xu, “Stable Diffusion is Unstable”.

[31] Wu, Weijia, et al. “Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models.” arXiv preprint arXiv:2303.11681 (2023).

[32] Xie, Shaoan, et al. “Smartbrush: Text and shape guided object inpainting with diffusion model.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[33] Ma, Jian, et al. “GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently.” arXiv preprint arXiv:2303.17870 (2023).

[34] Gu, Jiatao, et al. “Matryoshka Diffusion Models.” arXiv preprint arXiv:2310.15111 (2023).

[35] Li, Yanyu, et al. “SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds.” NeurIPS(2023).

[36] Lin, Han, et al. “Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model.” arXiv preprint arXiv:2404.09967 (2024).

Exemplar-guided Image Translation

Posted on 2022-09-09 | In paper note

Task: Each exemplar represents one domain. Transfer the style of exemplar image to the input image.

[1]

  • reconstruct the style code: [2]

  • use pretrained network (prior knowledge) to extract the style code: [3]

Reference

  1. Zhang, Pan, et al. “Cross-domain correspondence learning for exemplar-based image translation.” CVPR, 2020.

  2. Anokhin, Ivan, et al. “High-resolution daytime translation without domain labels.” CVPR, 2020.

  3. Tumanyan, Narek, et al. “Splicing ViT Features for Semantic Appearance Transfer.” CVPR, 2022.

NERF

Posted on 2022-08-22 | In paper note
  • NERF [1]

  • GIRAFFE [2]

Reference

[1] Mildenhall, Ben, et al. “Nerf: Representing scenes as neural radiance fields for view synthesis.” ECCV, 2020.

[2] Niemeyer, Michael, and Andreas Geiger. “Giraffe: Representing scenes as compositional generative neural feature fields.” CVPR, 2021.

Mask Form

Posted on 2022-07-25 | In paper note
  1. binary map

  2. frequency: DCT [1]

  3. PolarMask [2]

  4. Hyperbolic [3]

Reference

[1] Shen, Xing, et al. “Dct-mask: Discrete cosine transform mask representation for instance segmentation.” CVPR, 2021.

[2] Xie, Enze, et al. “Polarmask: Single shot instance segmentation with polar representation.” CVPR, 2020.

[3] GhadimiAtigh, Mina, et al. “Hyperbolic Image Segmentation.” arXiv preprint arXiv:2203.05898 (2022).

To a Beginner on Paper Writing

Posted on 2022-07-22 | In paper note
  • Carefully read the following instructions. These are the key points you should pay attention to when writing papers.

    • https://ustcnewly.github.io/2022/06/16/others/Paper%20Writing/
    • https://ustcnewly.github.io/2022/06/16/others/Paper%20Proofread/
    • https://ustcnewly.github.io/2022/06/16/others/Reasons%20to%20Reject%20a%20Paper/
  • The commonly used words in academic papers are summarized in https://ustcnewly.github.io/2022/06/16/others/Dictionary%20for%20Paper%20Writing/.

  • Before writing your own paper, carefully read 10 closely related papers and record the materials (words/phrases/sentences) which could be used in your paper. Organize your collected materials and think about when to use them. Do not copy them word-by-word, you need to incorporate them into your own paper coherently and seamlessly.

Network Architecture

Posted on 2022-07-15 | In paper note
  1. Transformer

  2. Large kernel: [1] [2] [3]

Reference

[1] Liu, Zhuang, et al. “A convnet for the 2020s.” CVPR, 2022.

[2] Ding, Xiaohan, et al. “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns.” CVPR, 2022.

[3] More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity

Install VirtualBox Guest Addition

Posted on 2022-06-16 | In software
  1. Before install guest addition from CD, do the following

    sudo apt-get install dkms build-essential linux-headers-generic linux-headers-$(uname -r)

    For missing linux kernel headers or other common problems, refer to this.

    use uname -r or uname -a to look up the kernel version, use dpkg --get-selections | grep linux to check the installed linux kernels.

  2. If you click the sharefolder item in the menubar and get the follwing error: ‘The VirtualBox Guest Additions do not seem to be available on this virtual machine, and shared folders cannot be used without them’, the following commands may be helpful.

    1
    2
    3
    4
    sudo apt-get install virtualbox-guest-additions-iso
    sudo apt-get update
    sudo apt-get dist-upgrade
    sudo apt-get install virtualbox-guest-x11

Enlarge VirtualBox vdi

Posted on 2022-06-16 | In software
  1. Go to virtualbox installation directory and execute the following command:

    D:\Program Files\Oracle\VirtualBox\VBoxManage.exe modifyhd "F:\VirtualBox\my ubuntu.vdi" --resize 15360

    Note 15360 is the new size (M), this command can only enlarge the size.

  2. Install gparted by sudo apt-get install gparted and make the extended disk space available to use.

  3. Remount /home to the new disk. For concrete steps, refer to this link.

1234…24
Li Niu

Li Niu

237 posts
18 categories
112 tags
Homepage GitHub Linkedin
© 2025 Li Niu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4