231217225 4DGen Grounded 4D Content Generation with Spatialtemporal Consistency

开源4D生成框架4DGen 基于动态 3D 高斯的可控 4D 生成 知乎

As show in figure above we define grounded 4D generation which focuses on videoto4D generation Video is not required to be userspecified but can also be generated by video diffusion With the help of stable video diffusion we implement the function of imagetovideoto4d and texttoimagetovideoto4d

To address the aforementioned challenges we introduce 4DGen a novel pipeline tackling a new task of Grounded 4D Generation which focuses on videoto4D generationAs shown in Fig 2 our primary strategy involves using monocular videos as conditional inputs to provide users with precise control over both the motion and appearance of generated 4D content

Previous work generates 4D content in one click Our work introduces Grounded 4D Content Generation which employs a video sequence and an optional 3D asset to specify the appearance and motion Method We conduct 4D generation grounded by a monocular video sequence Our 4D scene is implemented by deforming a static set of 3D Gaussians

4DGen is introduced a novel holistic framework for grounded 4D content creation that decomposes the 4D generation task into multiple stages and supports grounded generation offering users enhanced control a feature difficult to achieve with previous methods Aided by texttoimage and texttovideo diffusion models existing 4D content creation pipelines utilize score distillation sampling

As show in figure above we define grounded 4D generation which focuses on videoto4D generation Video is not required to be userspecified but can also be generated by video diffusion With the help of stable video diffusion we implement the function of imagetovideoto4d and texttoimagetovideoto4d

Paper page 4DGen Grounded 4D Content Generation with Spatial

Grounded 4d Gen

PDF 4DGen Grounded 4D Content Generation with Spatialtemporal

n Data Preparation n We release our collected data in Google Drive n Each test cases contains two folders namepose0 and namesyncpose0 refers to the monocular video sequencesync refers to the pseudo labels generated by SyncDreamer n We recommend using PracticalRIFE if you need to introduce more frames in your video sequence n To preprocess your own images into RGBA format

总结 4DGen定义了 Grounded 4D Generation的任务形式通过视频序列和可选3D模型的引入提升了4D生成的可控性通过高效的4D Gaussian Splatting的表达2D和3D伪标签的监督和时空的连续性约束使得4DGen可以实现高分辨率长时序的高质量的4D内容生成

Aided by texttoimage and texttovideo diffusion models existing 4D content creation pipelines utilize score distillation sampling to optimize the entire dynamic 3D scene However as these pipelines generate 4D content from text or image inputs directly they are constrained by limited motion capabilities and depend on unreliable prompt engineering for desired results To address these

To overcome the above issues we introduce 4DGen a novel pipeline tackling a new task of Grounded 4D GenerationAs shown in Fig 2 we provide users with explicit finegrained controllability over the generation of 4D contentOur key idea is to leverage a monocular video and an optional static 3D asset as conditional signals to specify the motion and appearance of the 4D content

4DGen Grounded 4D Content Generation with Spatialtemporal Consistency

Grounded 4d Gen

This work introduces 4DGen a novel holistic framework for grounded 4D content creation that decomposes the 4D generation task into multiple stages We identify static 3D assets and monocular video sequences as key components in constructing the 4D content

4DGen Grounded 4D Content Generation with Spatialtemporal GitHub

4DGen Grounded 4D Content Generation with Spatialtemporal Consistency

4DGen Grounded 4D Content Generation with Spatialtemporal Consistency

4DGenREADMEmd at main VITAGroup4DGen GitHub

4DGen Grounded 4D Content Generation with Spatialtemporal GitHub