Newly Blog


  • Home

  • Tags

  • Categories

  • Archives

  • Search

Generative Composition

Posted on 2025-01-06 | In paper note

(Object+Text)-Guided

Training-free

  • Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, Jinlong Peng, Chengjie Wang, Feng Zheng: “Tuning-Free Image Customization with Image and Text Guidance.“ arXiv preprint arXiv:2403.12658 (2024) [arXiv]

    Training-based

  • Yicheng Yang, Pengxiang Li, Lu Zhang, Liqian Ma, Ping Hu, Siyu Du, Yunzhi Zhuge, Xu Jia, Huchuan Lu: “DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting.“ arXiv preprint arXiv:2411.17223 (2024) [arXiv] [code]
  • Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin C.K. Chan, Yandong Li, Yanwu Xu, Kun Zhang, Tingbo Hou: “DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models.“ arXiv preprint arXiv:2312.03771 (2023) [arXiv]
  • Yulin Pan, Chaojie Mao, Zeyinzi Jiang, Zhen Han, Jingfeng Zhang: “Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance.“ arXiv preprint arXiv:2403.19534 (2024) [arXiv] [code]

Foreground: 3D; Background: image

  • Jinghao Zhou, Tomas Jakab, Philip Torr, Christian Rupprecht: “Scene-Conditional 3D Object Stylization and Composition.“ arXiv preprint arXiv:2312.12419 (2023) [arXiv] [code]

Foreground: 3D; Background: 3D

  • Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari: “InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes.“ arXiv preprint arXiv:2401.05335 (2024) [arXiv]
  • Rahul Goel, Dhawal Sirikonda, Saurabh Saini, PJ Narayanan: “Interactive Segmentation of Radiance Fields.“ CVPR (2023) [arXiv] [code]
  • Rahul Goel, Dhawal Sirikonda, Rajvi Shah, PJ Narayanan: “FusedRF: Fusing Multiple Radiance Fields.“ CVPR Workshop (2023) [arXiv]
  • Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, Gerard Pons-Moll: “Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation.“ WACV (2023) [arXiv]
  • Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng: “Compressible-composable NeRF via Rank-residual Decomposition.“ NIPS (2022) [arXiv] [code]
  • Bangbang Yang, Yinda Zhang, Yinghao Xu, Yijin Li, Han Zhou, Hujun Bao, Guofeng Zhang, Zhaopeng Cui: “Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering.“ ICCV (2021) [arXiv] [code]

Foreground: video; Background: image

  • Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang: “ActAnywhere: Subject-Aware Video Background Generation.“ arXiv preprint arXiv:2401.10822 (2024) [arXiv]

Foreground: video; Background: video

  • Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song: “Training-Free Semantic Video Composition via Pre-trained Diffusion Model.“ arXiv preprint arXiv:2401.09195 (2024) [arXiv]

  • Donghoon Lee, Tomas Pfister, Ming-Hsuan Yang: “Inserting Videos into Videos.“ CVPR (2019) [pdf]

Human Generation

Posted on 2024-05-20 | In paper note

Combine different components: [1] [2]

References

  1. Frühstück, Anna, et al. “Insetgan for full-body image generation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

  2. Huang, Zehuan, et al. “From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation.” arXiv preprint arXiv:2404.15267 (2024).

Mixture-of-Experts

Posted on 2024-05-08 | In paper note

The first paper: [1]

SwitchTransformer: [2]

Reference

Remote Debugging for Eclipse+PyDev

Posted on 2024-03-12 | In software , IDE

When server is Linux, client is Windows, using Python.

In the python file:

add the following code

1
2
import pydevd
pydevd.settrace('202.120.38.30', port=5678)

in which the IP address is the Windows IP and the port is the default Eclipse debugger port.

Under Linux:

  1. pip install pydevd
  2. In the file lib/python2.7/site-packages/pydevd_file_utils.py, modify PATHS_FROM_ECLIPSE_TO_PYTHON = [(r'L:\test', r'/home/niuli/test')], in which the former is Windows path and the latter is Linux path.

Under Windows:

  1. Install Eclipse IDE and PyDev plugin for Eclipse.
  2. Pydev->Start Debug Server
  3. Open the Debug perspective and watch Debug Server.

After the above preparation, run python code under Linux and the debugging process will jump to the debug server under Windows. For interactive debugging, open PyDev Debug Console.

Mirror Resources

Posted on 2024-02-04

arXiv: http://xxx.itp.ac.cn/......pdf

hugging face: https://hf-mirror.com

export HF_ENDPOINT=https://hf-mirror.com

from huggingface_hub import snapshot_download

snapshot_download(repo_id='Qwen/Qwen-7B',
                  repo_type='model',
                  local_dir='./model_dir',
                  resume_download=True)

modelscope: https://modelscope.cn/home

from modelscope.hub.snapshot_download import snapshot_download

model_dir = snapshot_download('qwen/Qwen-7B', 
                              cache_dir='./model_dir', 
                              revision='master')

Line Detection

Posted on 2024-01-03 | In paper note

[1] MLSD

Reference

[1] Gu, Geonmo, et al. “Towards light-weight and real-time line segment detection.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 1. 2022.

Diffusion Model Acceleration

Posted on 2023-11-27 | In paper note

Simplify model architecture: [3]

Reduce sampling times: DDIM [1], PLMS [2]

Step distillation: [3] [4] [5] [6]

Adversarial distillation: [7]

References

[1] Song, Jiaming, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models.” arXiv preprint arXiv:2010.02502 (2020).

[2] Liu, Luping, et al. “Pseudo numerical methods for diffusion models on manifolds.” ICLR, (2022).

[3] Li, Yanyu, et al. “SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds.” NeurIPS(2023).

[4] Salimans, Tim, and Jonathan Ho. “Progressive distillation for fast sampling of diffusion models.” ICLR, 2022.

[5] Luhman, Eric, and Troy Luhman. “Knowledge distillation in iterative generative models for improved sampling speed.” arXiv preprint arXiv:2101.02388 (2021).

[6] Meng, Chenlin, et al. “On distillation of guided diffusion models.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[7] https://stability.ai/research/adversarial-diffusion-distillation

Diffusion Model for Sketches

Posted on 2023-10-23 | In paper note

text-to-sketch

  • [1][SVG]: a) use text-to-image model to generate an image. Align image and sketch through CLIP. b) Align text and sketch through diffusion model.
  • [2][SVG]: a) Align text and sketch through CLIP
  • [3][SVG]: a) Align text and sketch through diffusion model

image-to-sketch

  • [4][SVG]: Align image and sketch. Use MLP to predict offsets from initial points.
  • [5][SVG]: Align image and sketch. Use saliency for initialization.

SVG: Scalable Vector Graphics
SDS: score distillation sampling

References

[1] Xing, Ximing, et al. “DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models.” arXiv preprint arXiv:2306.14685 (2023).

[2] Frans, Kevin, Lisa Soros, and Olaf Witkowski. “Clipdraw: Exploring text-to-drawing synthesis through language-image encoders.” Advances in Neural Information Processing Systems 35 (2022): 5207-5218.

[3] Jain, Ajay, Amber Xie, and Pieter Abbeel. “Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[4] Vinker, Yael, et al. “Clipascene: Scene sketching with different types and levels of abstraction.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[5] Vinker, Yael, et al. “Clipasso: Semantically-aware object sketching.” ACM Transactions on Graphics (TOG) 41.4 (2022): 1-11.

Diffusion Model for Videos

Posted on 2023-09-30 | In paper note

[1] The edited previous frame serves as condition for editing the next frame.

References

[1] Chai, Wenhao, et al. “StableVideo: Text-driven Consistency-aware Diffusion Video Editing.” arXiv preprint arXiv:2308.09592 (2023).

Deep Learning Model Deployment

Posted on 2023-09-04 | In software
  • tutorial: [1]
12…24
Li Niu

Li Niu

237 posts
18 categories
112 tags
Homepage GitHub Linkedin
© 2025 Li Niu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4