Generating Video with Conditional Control Diffusion Model

EasyChair Preprint 12390

24 pages•Date: March 4, 2024

Abstract

We present the Conditional Control Diffusion Model (CCDM), a neural network that converts a text-to-image (T2I) model into a video model by using conditional control while keeping the image quality of the original model. CCDM first trains on real video data, creating a composite model to fuse multiple frames and learn action priors. Then, CCDM adopts the Stable Diffusion architecture and integrates the T2I model, ensuring no changes to the T2I model during video generation. Finally, CCDM feeds back the generated frames to the model as feedback, reducing flickering caused by content changes. We test CCDM on various T2I models from CivitAI with different styles and features. Using prompts from the T2I model’s website, we generate videos and show that CCDM can produce dynamic information and handle generation tasks with 8GB VRAM. CCDM has excellent potential for video generation applications.

Keyphrases: Conditional Control, Video Generation, computer vision, diffusion model, text-to-image

Links:

https://easychair.org/publications/preprint/w9BT

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:12390,
  author    = {Xiaoyang Gao and Zheng Wen and Tao Yang},
  title     = {Generating Video with Conditional Control Diffusion Model},
  howpublished = {EasyChair Preprint 12390},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser