Temporally Consistent Semantic Video Editing

ECCV 2022

1University of Maryland, College Park, 2Virginia Tech


Generative adversarial networks (GANs) have demonstrated impressive image generation quality and semantic editing capability of real images, e.g., changing object classes, modifying attributes, or transferring styles. However, applying these GAN-based editing to a video independently for each frame inevitably results in temporal flickering artifacts. We present a simple yet effective method to facilitate temporally coherent video editing. Our core idea is to minimize the temporal photometric inconsistency by optimizing both the latent code and the pre-trained generator. We evaluate the quality of our editing on different domains and GAN inversion techniques and show favorable results against the baselines.


In-domain editing on Internet Videos

Our method can be applied to diverse videos from the Internet. We can edit it with available in-domain editing methods, e.g., InterfaceGAN, StyleCLIP.


Stitch in Time [Tzaban, et al.]


Out-of-domain editing on RAVDESS dataset

Our method can also be applied to Out-of-domain editing. We show the results on RAVDESS data below. The Out-of-domain editing is StyleGAN-NADA.

Please note that we do not apply stitching here.




Related Links

Our method is built upon prior work. We share some useful links below.

GAN inversion

PTI: A hybrid GAN inversion method by tuning the generator.
e4e: An encoder-based GAN inversion approach.
pSp: An encoder-based GAN inversion approach.

GAN-based editing

StyleCLIP: A language-based image editing approach via pre-trained StyleGAN.
StyleGAN-NADA: A language-based out-of-domain editing technique.
Latent Transformer: An in-domain editing approach by disentangling the editing content from other attributes.
Stitch in Time: provides a way to "unalign" edited frames back to the original video.

Blind Video Temporal Consistency

DVP: a blind video temporal consistency method.
Learning Blind Video Temporal Consistency: a blind video temporal consistency method.


  author    = {Xu, Yiran and AlBahar, Badour and Huang, Jia-Bin},
  title     = {Temporally consistent semantic video editing},
  journal   = {arXiv preprint arXiv: 2206.10590},
  year      = {2022},