Newly Blog


  • Home

  • Tags

  • Categories

  • Archives

  • Search

Image Action Recognition with Unlabeled Videos

Posted on 2022-06-16 | In paper note
  1. Self-supervised learning: see video-to-image in this blog.

  2. predict optical flow and use two-stream network [1]

  3. Predicting pose information (use poselet detector) [2]

Reference:

[1] Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.

[2] Chen, Chao-Yeh, and Kristen Grauman. “Watching unlabeled video helps learn new human actions from very few labeled snapshots.” CVPR, 2013.

High-resolution Image Generation

Posted on 2022-06-16 | In paper note
  • stacked generators from low-resolution to high-resolution: [4] [5] [6] [10]

  • low-resolution generator embedded in high-resolution generator, upsample low-resolution result and add residual: [1] [7] [8] [9] [12]

  • fuse low-resolution outputs: [3] [11]

  • shallow mapping from large-scale input to large-scale output: [2](look-up table) [15] [16]

  • joint upsampling: given high-resolution input and low-resolution output, get high-resolution output. 1) append high-resolution input [1] or the feature of high-resolution input [10] to refinement network. 2) guided filter [13], use high-resolution input as guidance and coarse high-resolution output as filter input. 3) attentional upsampling [14]

Reference

[1] Wang, Ting-Chun, et al. “High-resolution image synthesis and semantic manipulation with conditional gans.” CVPR, 2018.

[2] Zeng, Hui, et al. “Learning Image-adaptive 3D Lookup Tables for High Performance Photo Enhancement in Real-time.” PAMI, 2020.

[3] Yu, Haichao, et al. “High-Resolution Deep Image Matting.” arXiv preprint arXiv:2009.06613 (2020).

[4] Denton, Emily L., Soumith Chintala, and Rob Fergus. “Deep generative image models using a laplacian pyramid of adversarial networks.” NIPS, 2015.

[5] Huang, Xun, et al. “Stacked generative adversarial networks.” CVPR, 2017.

[6] Zhang, Han, et al. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks.” ICCV, 2017.

[7] Andreini, Paolo, et al. “A two stage gan for high resolution retinal image generation and segmentation.” arXiv preprint arXiv:1907.12296 (2019).

[8] Hamada, K., Tachibana, K., Li, T., Honda, H., & Uchida, Y. (2018). Full-body high-resolution anime generation with progressive structure-conditional generative adversarial networks. ECCV, 2018.

[9] Karras, Tero, et al. “Progressive growing of gans for improved quality, stability, and variation.” arXiv preprint arXiv:1710.10196 (2017).

[10] Chen, Qifeng, and Vladlen Koltun. “Photographic image synthesis with cascaded refinement networks.” ICCV, 2017.

[11] Anokhin, Ivan, et al. “High-Resolution Daytime Translation Without Domain Labels.” CVPR, 2020.

[12] Yi, Zili, et al. “Contextual residual aggregation for ultra high-resolution image inpainting.” CVPR, 2020.

[13] Wu, Huikai, et al. “Fast end-to-end trainable guided filter.” CVPR, 2018.

[14] Kundu, Souvik, et al. “Attention-based Image Upsampling.” arXiv preprint arXiv:2012.09904 (2020).

[15] Cong, Wenyan, et al. “High-Resolution Image Harmonization via Collaborative Dual Transformations.” CVPR, 2022.

[16] Liang, Jingtang, Xiaodong Cun, and Chi-Man Pun. “Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization.” ECCV, 2022.

Gradient Regularization

Posted on 2022-06-16 | In paper note
  • Gradient harmonization: [1]

[1] Gradient Harmonized Single-stage Detector, AAAI, 2019

GNN for Segmentation

Posted on 2022-06-16 | In paper note
  1. row-wise and column-wise LSTM on feature map: [1]
  2. graph LSTM on superpixels: [2]
  3. 3D graph: [3]
  4. DAG on feature map: [4]

Reference

  1. Li, Zhen, et al. “Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling.” ECCV, 2016.
  2. Liang, Xiaodan, et al. “Semantic object parsing with graph lstm.” ECCV, 2016.
  3. Qi, Xiaojuan, et al. “3d graph neural networks for rgbd semantic segmentation. ICCV, 2017.
  4. Ding, Henghui, et al. “Boundary-aware feature propagation for scene segmentation.” ICCV, 2019.

Geometry-aware Deep Feature

Posted on 2022-06-16 | In paper note
  1. Geometry feature generation based on unsupervisely detected landmarks. [1]

  2. Disentangle bottleneck features into category-invariant features and category-specific features. Category-invariant features encode the pose information.

Reference

  1. Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, Chen Change Loy: TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation. CVPR 2019

Geometry Transformation for Image Composition

Posted on 2022-06-16 | In paper note
  • Only geometry: [1], [2]

  • geometry+appearance [3]

  • geometry+occlusion+appearance: [4] [5]

Reference

  1. Lin, Chen-Hsuan, et al. “St-gan: Spatial transformer generative adversarial networks for image compositing.” CVPR, 2018.

  2. Kikuchi, Kotaro, et al. “Regularized Adversarial Training for Single-shot Virtual Try-On.” ICCV Workshops. 2019.

  3. Zhan, Fangneng, Hongyuan Zhu, and Shijian Lu. “Spatial fusion gan for image synthesis.” CVPR, 2019.

  4. Azadi, Samaneh, et al. “Compositional gan: Learning image-conditional binary composition.” International Journal of Computer Vision 128.10 (2020): 2570-2585.

  5. Fangneng Zhan, Jiaxing Huang, Shijian Lu, “Hierarchy Composition GAN for High-fidelity
    Image Synthesis.” Transactions on cybernetics, 2021.

Generative Model

Posted on 2022-06-16 | In paper note
  • GAN
  • VAE
  • diffusion model: [1] [2] [3] [4]

Tutorial of generative models:

  • Understanding Diffusion Models: A Unified Perspective

  • Unifying Generative Models with GFlowNets

References

[1] Dhariwal, Prafulla, and Alex Nichol. “Diffusion models beat gans on image synthesis.” arXiv preprint arXiv:2105.05233 (2021).

[2] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

[3] Elucidating the Design Space of Diffusion-Based Generative Models

[4] Wang, Tengfei, et al. “Pretraining is All You Need for Image-to-Image Translation.” arXiv preprint arXiv:2205.12952 (2022).

Gaze Estimation

Posted on 2022-06-16 | In paper note

Approaches

  1. Corneal reflection-based methods

    • NIR or LED illumination, learning the mapping (e.g., regression, ) between glint vector and gaze direction.
  2. Appearance based methods

    • Limbus model [pdf]: fit a limbus model (a fixed-diameter disc) to detected iris edges.

Auxiliary Tools

  1. Calibration: obtain the visual axis and kappa angle for each person.

  2. Facial landmarks detection

    • One Millisecond Face Alignment with an Ensemble of Regression Trees [pdf] [code]
    • Continuous Conditional Neural Fields for Structured Regression [pdf]
  3. Head Pose Estimation

    • EPnP algorithm [pdf]

Dataset

  1. [MPIIGaze]: fine-grained annotation

  2. [Eyediap]: RGB-D

GAN

Posted on 2022-06-16 | In paper note

Training tricks:

17 tricks for training GAN: https://github.com/soumith/ganhacks

  • soft label: replace 1 with 0.9 and 0 with 0.3

  • train discriminator more times (e.g., 2X) than generator

  • use labels: auxiliary tasks

  • normalize inputs to [-1, 1]

  • use tanh before output

  • use batchnorm (not for the first and last layer)

  • use spherical distribution instead of uniform distribution

  • leaky relu

  • stability tricks from RL

Tricks from the BigGAN [1]

  • class-conditional BatchNorm

  • Spectral normalization

  • orthogonal initialization

  • truncated prior (truncation trick to seek the trade-off between fidelity and variety)

  • enforce orthogonality on weights to improve the model smoothness

More tricks

  • gradient penalty [8]

  • unrolling [9] and packing [10]

Famous GANs:

  • LSGAN: replace cross-entropy loss with least square loss

  • Wasserstein GAN: replace discriminator with a critic function

  • LAPGAN: coarse-to-fine using laplacian pyramid

  • seqGAN: generate discrete sequences

  • E-GAN [2]: place GAN under the framework of genetic evolution

  • Dissection GAN [3]: use intervention for causality

  • CoGAN [4]: two generators and discriminators softly share parameters

  • DCGAN [5]

  • Progressive GAN [6]

  • Style-based GAN [7]

  • stack GAN [17]

  • self-attention GAN [18]

  • BigGAN [20]

  • LoGAN [19]

  • Conditioned on label vector: conditional GAN [14], CVAE-GAN [16]

  • Conditioned on a single image: pix2pix [11]; high-resolution pix2pix [12] (add coarse-to-fine strategy); BicycleGAN [13] (combination of cVAE-GAN and cLR-GAN); DAGAN [15]

  • StyleGAN-XL [23]

  • StyleGAN-T [22]

  • GigaGAN [21]

Measurement:

Results: Besides qualitative results, there are some quantitative metric like Inception score and Frechet Inception Distance.

Stability: for the stability of generator and discriminator, refer to [1].

Tutorial and Survey:

  • The GAN zoo: https://github.com/hindupuravinash/the-gan-zoo

  • A good tutorial: https://github.com/mingyuliutw/cvpr2017\_gan\_tutorial/blob/master/gan_tutorial.pdf

  • Regularization Methods for Generative Adversarial Networks: An Overview of Recent Studies

  • Generative adversarial networks in computer vision: A survey and taxonomy [code]

References

[1] Brock A, Donahue J, Simonyan K. Large scale gan training for high fidelity natural image synthesis[J]. arXiv preprint arXiv:1809.11096, 2018.

[2] Wang C, Xu C, Yao X, et al. Evolutionary Generative Adversarial Networks[J]. arXiv preprint arXiv:1803.00657, 2018.

[3] Bau D, Zhu J Y, Strobelt H, et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks[J]. arXiv preprint arXiv:1811.10597, 2018.

[4] Liu M Y, Tuzel O. Coupled generative adversarial networks[C]//Advances in neural information processing systems. 2016: 469-477.

[5] Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).

[6] Karras, Tero, et al. “Progressive growing of gans for improved quality, stability, and variation.” arXiv preprint arXiv:1710.10196 (2017).

[7] Karras, Tero, Samuli Laine, and Timo Aila. “A Style-Based Generator Architecture for Generative Adversarial Networks.” arXiv preprint arXiv:1812.04948 (2018).

[8] Gulrajani, Ishaan, et al. “Improved training of wasserstein gans.” Advances in Neural Information Processing Systems. 2017.

[9] Metz, Luke, et al. “Unrolled generative adversarial networks.” arXiv preprint arXiv:1611.02163 (2016).

[10] Lin, Zinan, et al. “PacGAN: The power of two samples in generative adversarial networks.” Advances in Neural Information Processing Systems. 2018.

[11] Isola, Phillip, et al. “Image-to-image translation with conditional adversarial networks.” CVPR, 2017

[12] Wang, Ting-Chun, et al. “High-resolution image synthesis and semantic manipulation with conditional gans.” CVPR, 2018.

[13] Zhu, Jun-Yan, et al. “Toward multimodal image-to-image translation.” NIPS, 2017.

[14] Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).

[15] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. “Data augmentation generative adversarial networks.” arXiv preprint arXiv:1711.04340 (2017).

[16] Bao, Jianmin, et al. “CVAE-GAN: fine-grained image generation through asymmetric training.” ICCV, 2017.

[17] Han Zhang, Tao Xu, Hongsheng Li, “StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks”, ICCV 2017

[18] Han Zhang, Ian J. Goodfellow, Dimitris N. Metaxas, Augustus Odena, “Self-Attention Generative Adversarial Networks”. CoRR abs/1805.08318 (2018)

[19] Wu, Yan, et al. “LOGAN: Latent Optimisation for Generative Adversarial Networks.” arXiv preprint arXiv:1912.00953 (2019).

[20] Brock, Andrew, Jeff Donahue, and Karen Simonyan. “Large scale gan training for high fidelity natural image synthesis.” arXiv preprint arXiv:1809.11096 (2018).

[21] Kang, Minguk, et al. “Scaling up GANs for Text-to-Image Synthesis.” arXiv preprint arXiv:2303.05511 (2023).

[22] Sauer, Axel, et al. “Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis.” arXiv preprint arXiv:2301.09515 (2023).

[24] Sauer, Axel, Katja Schwarz, and Andreas Geiger. “Stylegan-xl: Scaling stylegan to large diverse datasets.” ACM SIGGRAPH 2022 conference proceedings. 2022.

From Weak to Strong Supervision

Posted on 2022-06-16 | In paper note

Object Detection:

  1. image label: [WSDDN]

  2. points that indicate the location of the object

  3. bounding boxes

Segmentation:

  1. image label: [SEC]

  2. points that indicate the location of the object

  3. scribbles that imply the extent of the object

  4. bounding boxes

  5. segmentation masks

1…192021…24
Li Niu

Li Niu

239 posts
18 categories
114 tags
Homepage GitHub Linkedin
© 2025 Li Niu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4