Generative adversarial networks (GANs) have demonstrated impressive image generation quality and semantic editing capability of real images, e.g., changing object classes, modifying attributes, or transferring styles. However, applying these GAN-based editing to a video independently for each frame inevitably results in temporal flickering artifacts. We present a simple yet effective method to facilitate temporally coherent video editing. Our core idea is to minimize the temporal photometric inconsistency by optimizing both the latent code and the pre-trained generator. We evaluate the quality of our editing on different domains and GAN inversion techniques and show favorable results against the baselines.
Our method can be applied to diverse videos from the Internet. We can edit it with available in-domain editing methods, e.g., InterfaceGAN, StyleCLIP.
Our method can also be applied to Out-of-domain editing. We show the results on RAVDESS data below. The Out-of-domain editing is StyleGAN-NADA.
Please note that we do not apply stitching here.
Our method is built upon prior work. We share some useful links below.
@article{xu2022videoeditgan,
author = {Xu, Yiran and AlBahar, Badour and Huang, Jia-Bin},
title = {Temporally consistent semantic video editing},
journal = {arXiv preprint arXiv: 2206.10590},
year = {2022},
}