Weakly-supervised Localization

Posted on 2022-06-16 | In paper note

Closely related to weakly-supervised segmentation.

attention based: [1] [2]

Reference

[1] Zhang, Xiaolin, et al. “Adversarial complementary learning for weakly supervised object localization.” CVPR, 2018.

[2] Zhang, Xiaolin, et al. “Self-produced guidance for weakly-supervised object localization.” ECCV, 2018.

Weakly-supervised Classification

Posted on 2022-06-16 | In paper note

Two problems:

Label noise: label flip noise (belong to other training categories) and outlier noise (does not belong to any training category).
Domain shift: domain distribution mismatch between web data and consumer data.

Solutions:

label flip layer: [1] [2] [3]
multi-instance learning: [4] (pixel-level attention) [5] [6] [19](image-level attention)
reweight training samples: [7] [8] [9]
curriculumn learning: [10] [11]
bootstrapping: [12]
negative learning: [18]
Cyclical Training: [20]

Use auxiliary clean data:

active learning (select training samples to annotate): [13]
reinforcement learning (learn labeling policies): [14]
analogous to semi-supervised learning
- partial data with both noisy labels and clean labels as well as partial data with only noisy labels [15] [3] [7]
- partial data with noisy labels and partial data with clean labels [16] [17]

Datasets:

There are two types of label noise: synthetic label noise and web label noise.

large-scale web datasets: webvision v1, webvision v2
fine-grained web datasets: clothing, car, Stanford Dogs, Food101N, MIT indoor67, skin disease-198
synthetic noisy datasets via label flipping: CIFAR-10/100

Surveys:

A Survey of Label-noise Representation Learning: Past, Present and Future

Reference

[1] Chen, Xinlei, and Abhinav Gupta. “Webly supervised learning of convolutional networks.” ICCV, 2015.

[2] Sukhbaatar, Sainbayar, et al. “Training convolutional networks with noisy labels.” arXiv preprint arXiv:1406.2080 (2014).

[3] Xiao, Tong, et al. “Learning from massive noisy labeled data for image classification.” CVPR, 2015.

[4] Zhuang, Bohan, et al. “Attend in groups: a weakly-supervised deep learning framework for learning from web data.” CVPR, 2017.

[5] Wu, Jiajun, et al. “Deep multiple instance learning for image classification and auto-annotation.” CVPR, 2015.

[6] Ilse, Maximilian, Jakub M. Tomczak, and Max Welling. “Attention-based deep multiple instance learning.” arXiv preprint arXiv:1802.04712 (2018).

[7] Lee, Kuang-Huei, et al. “Cleannet: Transfer learning for scalable image classifier training with label noise.” CVPR, 2018.

[8] Liu, Tongliang, and Dacheng Tao. “Classification with noisy labels by importance reweighting.” T-PAMI, 2015.

[9] Misra, Ishan, et al. “Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels.” CVPR, 2016.

[10] Guo, Sheng, et al. “Curriculumnet: Weakly supervised learning from large-scale web images.” ECCV, 2018.

[11] Jiang, Lu, et al. “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels.” arXiv preprint arXiv:1712.05055 (2017).

[12] Reed, Scott, et al. “Training deep neural networks on noisy labels with bootstrapping.” arXiv preprint arXiv:1412.6596 (2014).

[13] Krause, Jonathan, et al. “The unreasonable effectiveness of noisy data for fine-grained recognition.” ECCV, 2016.

[14] Yeung, Serena, et al. “Learning to learn from noisy web videos.” CVPR, 2017.

[15] Veit, Andreas, et al. “Learning from noisy large-scale datasets with minimal supervision.” CVPR, 2017.

[16] Xu, Zhe, et al. “Webly-supervised fine-grained visual categorization via deep domain adaptation.” T-PAMI, 2016.

[17] Li, Yuncheng, et al. “Learning from noisy labels with distillation.” ICCV, 2017.

[18] Kim, Youngdong, et al. “Nlnl: Negative learning for noisy labels.” ICCV, 2019.

[19] “MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition”, CVPR, 2019.

[20] Huang, Jinchi, et al. “O2u-net: A simple noisy label detection approach for deep neural networks.” ICCV, 2019.

Weak-shot Object Detection

Posted on 2022-06-16 | In paper note

Weak-shot object detection is also called cross-supervised or mixed-supervised object detection. Specifically, all categories are splitted into base categories and novel categories. Base categories have box-level annotation while novel categories only have image-level annotations.

Transfer common objectness: [1] [3]
Transfer the mapping from inaccurate bounding boxes to accurate bounding boxes: [2]

Reference

[1] Zhong, Yuanyi, et al. “Boosting weakly supervised object detection with progressive knowledge transfer.” European Conference on Computer Vision. Springer, Cham, 2020.

[2] Chen, Zitian, et al. “Cross-Supervised Object Detection.” arXiv preprint arXiv:2006.15056 (2020).

[3] Li, Yan, et al. “Mixed supervised object detection with robust objectness transfer.” IEEE transactions on pattern analysis and machine intelligence 41.3 (2018): 639-653.

Watermark Removal

Posted on 2022-06-16 | In paper note

watermark removal: ICA [4], inpainting [5]
watermarks consistent across a collection of images: multi-image matting and reconstruction [3]

Survey papers on watermarking: [1] [2]

Reference

Podilchuk, Christine I., and Edward J. Delp. “Digital watermarking: algorithms and applications.” IEEE signal processing Magazine 18.4 (2001): 33-46.
Potdar, Vidyasagar M., Song Han, and Elizabeth Chang. “A survey of digital image watermarking techniques.” INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005.. IEEE, 2005.
Dekel, Tali, et al. “On the effectiveness of visible watermarks.” CVPR, 2017.

Visual Sentiment Analysis

Posted on 2022-06-16 | In paper note

Visual Object Tracking

Posted on 2022-06-16 | In paper note

Problem

Tracking is challenging due to the following factors: deformation, illumination variation, blur&fast motion, background clutter, rotation, scale, boundary effect

History

Tracking methods can be roughly categorized into generative methods and discriminative methods(feature+machine learning). Recently, correlation filter based methods and deep learning methods are dominant.

Meanshift: density based, ASMS https://github.com/vojirt/asms
Particle filter: particle based statistical method
Optical flow: match feature points between neighboring frames
correlation filter: KCF, DCF, CSK, CN, DSST, SRDCF, ECO. Basic CF methods are sensitive to deformation, fast motion, and boundary effect.
deep learning: GOTURN, MDNet, TCNN, SiamFC

Two research groups contribute to CF methods most:

Oxford: https://www.robots.ox.ac.uk/~luca/,
Linkoping: http://users.isy.liu.se/en/cvl/marda26/

Comparison of Speed and Performance

Survey papers

Object tracking: A survey, 2006
Object tracking benchmark, 2015

Benchmark

OTB50/100: http://cvlab.hanyang.ac.kr/tracker_benchmark/
VOT2016: http://www.votchallenge.net/vot2016/dataset.html

Challenge

Visual Object Tracking (VOT) challenge:
http://www.votchallenge.net/challenges.html
VOT2016 has released the code of many trackers: http://votchallenge.net/vot2016/trackers.html
Multiple Object Tracking Challenge (MOT) challenge:
https://motchallenge.net/

Detection based Tracking

Detection based tracking is also named as tracking by detection or multiple object tracking. (MOT Challenge)

TLD (tracking-learning-detection): update tracker and detector during learning
http://personal.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html

Visual Dialogue

Posted on 2022-06-16 | In paper note

Visual dialogue [1]: a dialogue with one image

Multimodal Dialogue [2][3]: a dialogue with multiple images

[1] Visual Dialog

[2] Towards Building Large Scale Multimodal Domain-Aware Conversation Systems

[3] Knowledge-aware Multimodal Dialogue Systems

Video Object Segmentation

Posted on 2022-06-16 | In paper note

Given the segmentation mask of the first frame of a video clip, predict the segmentation masks in the subsequent frames.

Davis challenge https://davischallenge.org/ held since 2017, related papers [1] [2]
YouTube-VOS: A Large-Scale Benchmark for Video Object Segmentation https://youtube-vos.org/home
GyGO: an E-commerce Video Object Segmentation Dataset by Visualead https://github.com/ilchemla/gygo-dataset

Reference:

Perazzi, Federico, et al. “A benchmark dataset and evaluation methodology for video object segmentation.” CVPR, 2016.
Pont-Tuset, Jordi, et al. “The 2017 davis challenge on video object segmentation.” arXiv preprint arXiv:1704.00675 (2017).

Video Instance Segmentation

Posted on 2022-06-16 | In paper note

Track-by-Detect: MaskTrack R-CNN [1]
Clip-Match: Vistr [2]
Propose-Reduce: [3]

Reference

[1] Yang, Linjie, Yuchen Fan, and Ning Xu. “Video instance segmentation.” ICCV, 2019.

[2] Wang, Yuqing, et al. “End-to-end video instance segmentation with transformers.” CVPR, 2021.

[3] Lin, Huaijia, et al. “Video instance segmentation with a propose-reduce paradigm.” ICCV, 2021.

Video Harmonization

Posted on 2022-06-16 | In paper note

First deep learning approach for video harmonization [1]

[1] Haozhi Huang, Senzhe Xu, Junxiong Cai, Wei Liu, Shimin Hu, “Temporally Coherent Video Harmonization Using
Adversarial Networks”, arxiv, 2018.