3DPhysVideo: Consistency-Guided Flow SDE for Video Generation

Move your cursor across the scene below to breathe life into each part with 3D physics

We propose 3DPhysVideo, a training-free framework for 3D physics-conditioned video generation, leveraging an off-the-shelf video model. From a single input scene, our method enables users to apply diverse physical controls to a variety of materials.

Abstract

Video generative models have made remarkable progress, yet they often yield visual artifacts that violate grounding in physical dynamics. Recent works such as PhysGen3D tackle single image-to-3D physics through mesh reconstruction and Physically-Based Rendering, but challenges remain in modeling fluid dynamics, multi-object interactions and photorealism. This work introduces 3DPhysVideo, a novel training-free pipeline that generates physically realistic videos from a single image. We repurpose an off-the-shelf video model for two stages. First, we use it as a novel view synthesizer to reconstruct complete 360-degree 3D scene geometry by guiding the image-to-video (I2V) flow model with rendered point clouds. Second, after applying physics solvers to this geometry, the physically simulated point cloud is used to guide the same I2V flow model to synthesize final, high-quality videos. Consistency-Guided Flow SDE, which decomposes the predicted velocity of the I2V flow model into denoising and consistency bias, enforces consistency to the conditional inputs, allowing us to effectively repurpose the model for both 3D reconstruction and simulation-guided video generation. In diverse experiments including multi-object and fluid interaction scenes, our method successfully bridges the gap from single images to physically plausible videos, while remaining efficient to run on a single consumer GPU. It outperforms baselines on GPT-based scores, the VideoPhy benchmark and human evaluation.

Interactive 3D-Physics Control

Our SDE Across Tasks

Physics simulation

+ Our SDE

Physics simulation

+ Our SDE

Bounding-box trajectory

+ Our SDE

Cut-and-drag

+ Our SDE

Static scene (full 360°)

+ Our SDE

Dynamic scene (zoom out)

+ Our SDE

“A set of five metal spheres hangs in a line. Two spheres on the left swing down in a smooth, slow arc … the two spheres on the far right rise upward, maintaining the exact five-sphere arrangement, all unfolding in a slow, deliberate motion.”

“A Snorlax melting into molten lava, surrounded by flames and glowing embers, as fiery lava bursts and flows around it.”

Naive inference

+ Our SDE

Initial video

+ Our SDE

Comparison with Baselines

Full Pipeline: Single Image to Video Generation

Image

Ours

The blue ball strikes the dominos.

The macarons and jellies fall to the table.

The sand castle collapses.

The book falls forward.

Apple drops to the right.

The blue ball drops to the green playdough.

The yellow brick falls right.

The ball on the left moves right, hits the ball on the right.

The snorlax plush toy deflates.

The red can moves right, hits the teddy on the right.

Stage 1: Camera-controlled Video Generation

Static Scene

Camera Trajectory

Ours

Full 360°

Dynamic Scene

Camera Trajectory

Ours

Zoom Out

Translate Down

Translate Up

Stage 2: Motion-conditioned Video Generation

Image

Simulation Result

Ours

The yellow brick falls right.

The book falls forward.

The red can moves right, hits the teddy on the right.

The snorlax plush toy deflates.

BibTeX

@article{kim20263dphysvideo,
  title   = {3DPhysVideo: Consistency-Guided Flow SDE for Video Generation
             via 3D Scene Reconstruction and Physical Simulation},
  author  = {Kim, Hwidong and Kim, Yunho and Kim, Tae-Kyun},
  journal = {arXiv preprint arXiv:2605.16795},
  year    = {2026},
}

3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation

Abstract

Interactive 3D-Physics Control

Our SDE Across Tasks

Comparison with Baselines

Full Pipeline: Single Image to Video Generation

Stage 1: Camera-controlled Video Generation

Stage 2: Motion-conditioned Video Generation

BibTeX