Simply speaking, image composition means cut-and-paste, that is, cutting one piece from one image and paste it on another image. The obtained composite image may be unrealistic due to the following reasons:
- The foreground is not well segmented, so there is an evident and unnatural boundary between foreground and background.
- The foreground and background may look incompatible due to different color and illumination statistics. For example, the foreground is captured in the daytime while the background is captured at night.
- The foreground is placed at an unreasonable location. For example, a horse is placed in the sky.
- The foreground needs to be geometrically transformed. For example, when pasting eye glasses on a face, the eye glasses should fit the eyes and ears on the face.
- The pasted foreground may also affect the background. For example, the foreground may cast a shadow on the background.
Therefore, image composition is actually a combination of multiple subtasks.
Previously, some works only focus on one subtask such as harmonization or geometric transformation [1]. Some other works attempt to solve all subtasks in a single package [2] [3] [4] [5] [6].
Human matting+composition: [7]
Reference
[1] Lin, Chen-Hsuan, et al. “St-gan: Spatial transformer generative adversarial networks for image compositing.”, CVPR, 2018.
[2] Tan, Fuwen, et al. “Where and who? automatic semantic-aware person composition.” WACV, 2018.
[3] Chen, Bor-Chun, and Andrew Kae. “Toward Realistic Image Compositing with Adversarial Learning.” CVPR, 2019.
[4] Lingzhi Zhang, Tarmily Wen, Jianbo Shi: Deep Image Blending. WACV 2020: 231-240
[5] Weng, Shuchen, et al. “MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis.” CVPR, 2020.
[6] Zhan, Fangneng, et al. “Adversarial Image Composition with Auxiliary Illumination.” arXiv preprint arXiv:2009.08255 (2020).
[7] Zhang, He, et al. “Deep Image Compositing.” arXiv preprint arXiv:2011.02146 (2020).