Docker

Posted on 2023-09-04 | In software

Difference between VM and Docker

VM: guest OS -> BINS&LIBS -> App
Docker: BINS*LIBS -> App

Installation

https://docs.docker.com/engine/install/ubuntu/

Uninstallation

Uninstall the Docker Engine, CLI, containerd, and Docker Compose packages: sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras

Images, containers, volumes, or custom configuration files on your host aren’t automatically removed. To delete all images, containers, and volumes: sudo rm -rf /var/lib/docker, sudo rm -rf /var/lib/containerd

Commands

image

docker image pull $image_name:$version

docker image tag $image_name:version $registryIP:port/username/image_name:version

docker image push $registryIP:port/username/image_name:version

docker image build -t $image_name .

docker image ls

docker image rm $image_name

docker image save $image_name > $filename

docker load < $filename

container

docker container create $image_name

docker container start $container_ID

docker container run $image_name # run is equal to create and start

docker container run -it $image_name /bin/bash

docker container ls, docker container ls --a

docker container pause $container_ID

docker container unpause $container_ID

docker container stop $container_ID

docker container kill $container_ID # the difference between stop and kill is that stop may do some clean-up before killing the container

docker container rm $container_ID

docker container prune # remove all the exit containers

docker container exec -it $containe_ID /bin/bash

docker container cp $container_ID:$file_path .

docker container commit $container_ID $image_name:$version

Dockerfile

FROM python:3.7
WORKDIR ./docker_demo
ADD . .
RUN pip install -r requirements.txt
CMD ["python", "./src/main.py"]

Tutorial: [1]

Use GPU in docker

Install nvidia-docker https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
docker run --gpus all -it $image_name

Segment Anything

Posted on 2023-07-14 | In paper note

SAM [1]
FastSAM [2]: first generate proposals and then select target proposals
High-quality SAM [3]
Semantic-SAM [4]: assign semantic labels

Reference

[1] Kirillov, Alexander, et al. “Segment anything.” arXiv preprint arXiv:2304.02643 (2023).

[2] Zhao, Xu, et al. “Fast Segment Anything.” arXiv preprint arXiv:2306.12156 (2023).

[3] Ke, Lei, et al. “Segment Anything in High Quality.” arXiv preprint arXiv:2306.01567 (2023).

[4] Li, Feng, et al. “Semantic-SAM: Segment and Recognize Anything at Any Granularity.” arXiv preprint arXiv:2307.04767 (2023).

Drag Image Editing

Posted on 2023-07-14 | In paper note

DragGAN [1]: motion supervision and point tracking
DragDiffusion [2]: motion supervision and point tracking, similar to [1]
DragonDiffusion [3]: no point tracking, two parallel feature branches
FreeDrag [4]: no point tracking, adaptive template features, line search, fuzzy localization

References

[1] Pan, Xingang, et al. “Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold.” SIGGRAPH (2023).

[2] Shi, Yujun, et al. “DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing.” arXiv preprint arXiv:2306.14435 (2023).

[3] Mou, Chong, et al. “DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models.” arXiv preprint arXiv:2307.02421 (2023).

[4] Ling, Pengyang, et al. “FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing.” arXiv preprint arXiv:2307.04684 (2023).

Disentangled Generative Model

Posted on 2023-06-20 | In paper note

A survey on disentangled representation learning [1]

Disentangled Diffusion Model

[2]: make adjustment based on text embedding, learn optimal combination coefficients for different time steps.
[3]: disentangled gradient field, predict gradients conditioned on latent factors
[4]: semantic subcode and stochastic details
[5]: predict the direction change in the latent h-space

References

[1] Xin Wang, Hong Chen, Siao Tang, Zihao Wu, and Wenwu Zhu. “Disentangled Representation Learning.”

[2] Wu, Qiucheng, et al. “Uncovering the disentanglement capability in text-to-image diffusion models.” CVPR, 2023.

[3] Yang, Tao, et al. “DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models.” arXiv preprint arXiv:2301.13721 (2023).

[4] Preechakul, Konpat, et al. “Diffusion autoencoders: Toward a meaningful and decodable representation.” CVPR, 2022.

[5] Kwon, Mingi, Jaeseok Jeong, and Youngjung Uh. “Diffusion models already have a semantic latent space.” arXiv preprint arXiv:2210.10960 (2022).

Blender

Posted on 2022-11-14 | In software

Prompt for Vision

Posted on 2022-10-17 | In paper note

Prompt for image-to-image translation: [1]
Prompt for visual grounding: [2]

References

[1] Bar, Amir, et al. “Visual Prompting via Image Inpainting.” arXiv preprint arXiv:2209.00647 (2022).

[2] Yao, Yuan, et al. “Cpt: Colorful prompt tuning for pre-trained vision-language models.” arXiv preprint arXiv:2109.11797 (2021).

Smoothness Loss

Posted on 2022-10-15 | In paper note

Total Variation (TV) loss
Poisson blending loss [1] [2]
Gradient loss [1]
Laplacian loss [1] [2]

Image Matting

Posted on 2022-09-23 | In paper note

Background

The target is separating foreground from background given some user annotation (e.g., trimask, scribble). The prevalent technique alpha matting is to solve $\mathbf{\alpha}$ (primary target), $\mathbf{F}$, $\mathbf{B}$ (subordinate target) in $\mathbf{I}=\mathbf{\alpha}\circ\mathbf{F}+(1-\mathbf{\alpha})\circ \mathbf{B}$ [1] [2] [3].

Datasets

Alphamatting.com Dataset: 25 train images, 8 test images, each has 3 different trimaps: small, large, user. Input: image and trimap.
Composition-1k Dataset: 1000 images and 50 unique foregrounds.
Matting Human Dataset: 34427 images, annotation is not very accurate.
Dinstinctions-646: composed of 646 foreground images
Text matting dataset

Evaluation metrics

quantitative: Sum of Absolute Differences (SAD), Mean Square Error (MSE), Gradient error, Connectivity error.

Methods

Affinity-based [1]: pixel similarity metrics that rely on color similarity or spatial proximity.
Sampling-based [8]: the foreground/background color of unknown pixels can be obtained by sampling the foreground/background color of known pixels.
Learning-based
- With trimap:
  - Encoder-Decoder network [2] is the first end-to-end method for image matting: input image and trimap, output alpha; alpha loss and compositional loss; refine alpha.
  - DeepMattePropNet [4]: use deep learning to approximate affinity-based matting method; compositional loss.
  - AlphaGAN [6]: combine GAN with alpha loss and compositional loss.
  - Learning based sampling [7]
- Without trimap:
  - Light Dense Network (LDN) + Feathering Block (FB) [3]: generate segmentation mask and refine the mask with feathering block.
  - T-Net+M-net [5]: use segmentation task as trimap
  - [9]: capture the background image without subject and a corresponding video with subject

Losses

gradient loss [11] Laplacian loss [12]

Extension

Omnimatte [10]: segment objects and scene effects related to the objects (shadows, reflections, smoke)

User-guided Image Matting

unified interactive image matting: [13]

Reference:

[1] Aksoy, Yagiz, Tunc Ozan Aydin, and Marc Pollefeys. “Designing effective inter-pixel information flow for natural image matting.” CVPR, 2017.

[2] Xu, Ning, et al. “Deep image matting.” CVPR, 2017.

[3] Zhu, Bingke, et al. “Fast deep matting for portrait animation on mobile phone.” ACM MM, 2017.

[4] Wang, Yu, et al. “Deep Propagation Based Image Matting.” IJCAI. 2018.

[5] Quan Chen, Tiezheng Ge, Yanyu Xu, Zhiqiang Zhang, Xinxin Yang, Kun Gai, “Semantic Human Matting.” ACM MM, 2018.

[6] Lutz, Sebastian, Konstantinos Amplianitis, and Aljosa Smolic. “AlphaGAN: Generative adversarial networks for natural image matting.” BMVC, 2018.

[7] Jingwei Tang, Yagız Aksoy, Cengiz Oztireli, Markus Gross, Tunc Ozan Aydın. “Learning-based Sampling for Natural Image Matting”, CVPR, 2019.

[8] Feng, Xiaoxue, Xiaohui Liang, and Zili Zhang. “A cluster sampling method for image matting via sparse coding.” ECCV, 2016.

[9] Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman:
Background Matting: The World is Your Green Screen. CVPR, 2020.

[10] Lu, Erika, et al. “Omnimatte: Associating Objects and Their Effects in Video.” CVPR, 2021.

[11] Zhang, Yunke, et al. “A late fusion cnn for digital matting.” CVPR, 2019.

[12] Hou, Qiqi, and Feng Liu. “Context-aware image matting for simultaneous foreground and alpha estimation.” ICCV. 2019.

[13] Yang, Stephen, et al. “Unified interactive image matting.” arXiv preprint arXiv:2205.08324 (2022).

Consistent Video Editing

Posted on 2022-09-20 | In paper note

Template based: [1] [2]

References

Kasten, Yoni, et al. “Layered neural atlases for consistent video editing.” ACM Transactions on Graphics (TOG) 40.6 (2021): 1-12.
Ye, Vickie, et al. “Deformable Sprites for Unsupervised Video Decomposition.” CVPR, 2022.

Artistic and Photorealistic Style Transfer

Posted on 2022-09-19 | In paper note

Transfer strategy

Feature transfer: AdaIN [1], WCT [2], SANet [3].
Color transfer: Learn color transformation (explicit function or implicit function (e.g., look-up table) conditioned on color values, location, semantic information, or other guidance.

Compare Different Backbones

[6] [7]

Losses:

paired supervision: L2 loss
unpaired supervision: adversarial loss
smooth loss: variation loss, Poisson loss
content loss: perception loss
style loss: Gram loss, AdaIn loss

Multi-scale stylization

parallel: [4]
sequential: [5]

Reference

[1] Huang, Xun, and Serge Belongie. “Arbitrary style transfer in real-time with adaptive instance normalization.” ICCV, 2017.

[2] Li, Yijun, et al. “Universal style transfer via feature transforms.” NeurIPS, 2017.

[3] Park, Dae Young, and Kwang Hee Lee. “Arbitrary style transfer with style-attentional networks.” CVPR, 2019.

[4] Liu, Songhua, et al. “Adaattn: Revisit attention mechanism in arbitrary neural style transfer.” ICCV, 2021.

[5] Xia, Xide, et al. “Joint bilateral learning for real-time universal photorealistic style transfer.” ECCV, 2020.

[6] Wang, Pei, Yijun Li, and Nuno Vasconcelos. “Rethinking and improving the robustness of image style transfer.” CVPR, 2021.

[7] Wei, Hua-Peng, et al. “A Comparative Study of CNN-and Transformer-Based Visual Style Transfer.” Journal of Computer Science and Technology 37.3 (2022): 601-614.