Frequency Domain
Distinguish generated fake images and real images in the freqency domain. [2]
An image can be composed of or decomposed into low-frequency part and high-frequency part [3] [8] [4] [10]
Reference
Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren, “Learning in the Frequency Domain”, CVPR, 2020.
Wang, Sheng-Yu, et al. “CNN-generated images are surprisingly easy to spot… for now.” arXiv preprint arXiv:1912.11035 (2019).
ayush Bansal, Yaser Sheikh, Deva Ramanan, “PixelNN: Example-based Image Synthesis”, ICLR 2018.
Yanchao Yang, Stefano Soatto, “FDA: Fourier Domain Adaptation for Semantic Segmentation”, CVPR 2020.
Roy, Hiya, et al. “Image inpainting using frequency domain priors.” arXiv preprint arXiv:2012.01832 (2020).
Shen, Xing, et al. “DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation.” arXiv preprint arXiv:2011.09876 (2020).
Suvorov, Roman, et al. “Resolution-robust Large Mask Inpainting with Fourier Convolutions.” WACV (2021).
Yu, Yingchen, et al. “WaveFill: A Wavelet-based Generation Network for Image Inpainting.” ICCV, 2021.
Mardani, Morteza, et al. “Neural ffts for universal texture image synthesis.” NeurIPS (2020).
Cai, Mu, et al. “Frequency domain image translation: More photo-realistic, better identity-preserving.” ICCV, 2021.
Forecast Future based on One Still Image
Predict visual feature of one future frame [1]
Predict optical flow of one future frame [2]
Predict one future frame [4] (a special case of video prediction)
Predict future trajectories [5]
Predict optical flows of future frames, and then obtain future frames [3]
Reference
Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating visual representations from unlabeled video.” CVPR, 2016.
Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.
Li, Yijun, et al. “Flow-grounded spatial-temporal video prediction from still images.” ECCV, 2018.
Xue, Tianfan, et al. “Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks.” NIPS, 2016.
Walker, Jacob, et al. “An uncertain future: Forecasting from static images using variational autoencoders.” ECCV, 2016.
Few-Shot Feature Generation
Few-shot Feature Generation
Meta-learning method: [1]
Delta-based: delta between each pair of samples [2]; delta between each sample and class center [3] [4]
Reference
[1] Zhang, Ruixiang, et al. “Metagan: An adversarial approach to few-shot learning.” NIPS, 2018.
[2] Schwartz, Eli, et al. “Delta-encoder: an effective sample synthesis method for few-shot object recognition.” Advances in Neural Information Processing Systems. 2018.
[3] Liu, Jialun, et al. “Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective.” arXiv preprint arXiv:2002.10826 (2020).
[4] Yin, Xi, et al. “Feature transfer learning for face recognition with under-represented data.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Energy Efficient Deep Learning
Light-weighted network structure
- Xception: strictly speaking, not light-weighted CNN
- SqueezeNet
- MobileNet
- ShuffleNet
- MicroNet
SqueezeNet, MobileNet, and ShuffleNet share the same idea: decouple the temporal convolution and spatial convolution to reduce the nummber of parameters, sharing the similar spirit with Pseudo-3D Residual Networks. SqueezeNet is serial while MobileNet and ShuffleNet are parrallel. MobileNet is a special case of ShuffleNet when using only one group.
Low-rank approximation ($k\times k \times c\times d = k\times k\times c\times d’ + 1\times 1\times d’\times d$) also falls into the above scope. The difference between MobileNet and Low-rank approximation is layerwise convolution or not.
Tweak network structure
- prune nodes based on certain criteria (e.g., response value, Fisher information): require special implementation and take up more space than expected due to irregular network structure.
Compress weights
- Quantization (fixed bit number): learn codebook and encode weights. Fine-tune codebook after quantizatizing weights, which averages the gradient of weights belonging to the same cluster. Extreme cases are binary net and ternary net. Binary (resp, ternary) net are quantized to [-1, 1] (resp, [-1, 0, 1]), with different weights $\alpha$ for different layers.
- Huffman Coding (flexible bit number): applied after quantization for further compression.
Computation
- spatial domain to frequency domain: convert convolution to pointwise multiplication by using FFT
Sparsity regularization
Efficient Inference
Good introduction slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf
Dynamic Kernel
Survey: [Dynamic neural networks: A survey]
References
Jia, Xu, et al. “Dynamic filter networks.” Advances in neural information processing systems 29 (2016).
Tian, Zhi, Chunhua Shen, and Hao Chen. “Conditional convolutions for instance segmentation.” European conference on computer vision. Springer, Cham, 2020.
Domain Adaptation
Methods
learn projection matrix: F(PXs, QXt)
sample selection: learn sample weights
domain-invariant and domain-specific components
low-rank reconstruction
pixel-level image to image translation
adversarial network [1]: classification and domain confusion. The domain separation and confusion problem, which is a min-max problem, can be solved like GAN or using reverse gradient (RevGrad) algorithm.
meta-learning
- gradients on two domains should be consistent [pdf]
guided learning: tutor guides students and get feedback from students. ACM-MM18 paper
ensemble transfer learning: aggregate multiple transfer learning approaches [1]
Settings
open-set domain adaptation or partial transfer learning: [1][2][3]
distant domain adaptation (two domains are too distant, so the transfer between them relies on transition domains): Transitive transfer learning, distant domain transfer learning
open compound domain adaptation [1]
Domain adaptation for diverse applications
Domain difference metric: To measure data distribution mismatch, the most commonly used metric is MMD and its extensions such as fast MMD, conditional MMD [1][2] and joint MMD. There are also some other metrics like KL divergence, HSIC criterion, Bregman divergence, manifold criterion, and second-order statistic.
Theories: A summary of related theories
Survey:
Disentangled Representation
Methods:
The goal of Disentangled Representation [4] is to extract explanatory factors of the data in the input distribution and generate a more meaningful representation. disentangle codes/encodings/representations/latent factors/latent variables. single-dimension attribute encoding or multi-dimension attribute encoding.
A math definition of disentangled representation [11]
A survey on disentangled representation learning [19]
Unsupervised disentanglement
Recently, InfoGAN [5] utilizes GAN framework and maximizes the mutual information between a subset of the latent variables to learn disentangled representations in an unsupervised manner. Different latent variables are enforced to be independent based on the independence assumption [6].
Supervised disentanglement
Swapping attribute representation with the supervision of attribute annotation such as Dual Swap GAN [7] (semi-supervised) and DNA-GAN [8].
Disentangle representation for domain adaptation, disentangle representation into Class/domain-invariant and class/domain-specific: [9][10][12] [13]
close-form disentanglement [18]: after the model is trained, perform eigen decomposition to obtain orthogonal directions.
Disentanglement metric:
Reference
[1] Higgins, Irina, et al. “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework.” ICLR 2.5 (2017): 6.
[2] Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” CVPR, 2019.
[4] Representation learning: A review and new perspectives
[5] Infogan: Interpretable representation learning by information maximizing generative adversarial nets
[6] Learning Independent Features with adversarial Nets for Non-linear ICA
[8] DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images
[9] Image-to-image translation for cross-domain disentanglement
[10] Diverse Image-to-Image Translation via Disentangled Representations
[11] Higgins, Irina, et al. “Towards a definition of disentangled representations.” arXiv preprint arXiv:1812.02230 (2018).
[12] Gabbay, Aviv, and Yedid Hoshen. “Demystifying Inter-Class Disentanglement.” arXiv preprint arXiv:1906.11796 (2019).
[13] Hadad, Naama, Lior Wolf, and Moni Shahar. “A two-step disentanglement method.” CVPR, 2018.
[14] Shen, Zhiqiang, et al. “Towards instance-level image-to-image translation.” CVPR, 2019.
[15] Sangwoo Mo, Minsu Cho, Jinwoo Shin:
InstaGAN: Instance-aware Image-to-Image Translation. ICLR, 2019.
[16] Liu, Ming-Yu, et al. “Few-shot unsupervised image-to-image translation.” ICCV, 2019.
[17] Saito, Kuniaki, Kate Saenko, and Ming-Yu Liu. “COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder.” arXiv preprint arXiv:2007.07431 (2020).
[18] Shen, Yujun, and Bolei Zhou. “Closed-Form Factorization of Latent Semantics in GANs.” arXiv preprint arXiv:2007.06600 (2020).
[19] Xin Wang, Hong Chen, Siao Tang, Zihao Wu, and Wenwu Zhu. “Disentangled Representation Learning.”