Download PDFOpen PDF in browserGenerating Video with Conditional Control Diffusion ModelEasyChair Preprint 1239024 pages•Date: March 4, 2024AbstractWe present the Conditional Control Diffusion Model (CCDM), a neural network that converts a text-to-image (T2I) model into a video model by using conditional control while keeping the image quality of the original model. CCDM first trains on real video data, creating a composite model to fuse multiple frames and learn action priors. Then, CCDM adopts the Stable Diffusion architecture and integrates the T2I model, ensuring no changes to the T2I model during video generation. Finally, CCDM feeds back the generated frames to the model as feedback, reducing flickering caused by content changes. We test CCDM on various T2I models from CivitAI with different styles and features. Using prompts from the T2I model’s website, we generate videos and show that CCDM can produce dynamic information and handle generation tasks with 8GB VRAM. CCDM has excellent potential for video generation applications. Keyphrases: Conditional Control, Video Generation, computer vision, diffusion model, text-to-image
|