A new paper from researchers from Google and Cornell has introduced DynlBaR, a new method for generating photorealistic free-viewpoint rendering. And according to the team, it all comes from a single video of a complex and dynamic scene.
In recent years, the field of computer vision has witnessed incredible advancements in reconstructing static 3D scenes using neural radiance fields (NeRFs). While these techniques have revolutionized our ability to create realistic 3D representations, extending them to dynamic scenes has posed significant challenges.
Now enter DynIBaR: Neural Dynamic Image-Based Rendering, an innovative AI technique introduced by researchers from Google and Cornell at CVPR 2023, offering a solution for capturing dynamic scenes with a standard phone camera.
What makes this interesting is that creating accurate and clear representations of dynamic scenes in real-world settings has been a persistent challenge in computer vision. Existing methods, including space-time neural radiance fields or Dynamic NeRFs, often struggle when faced with lengthy videos, complex object motions, and unregulated camera trajectories.
This limitation has limited their practical applicability, especially when using everyday tools like smartphone cameras to capture dynamic scenes. DynIBaR takes dynamic scene reconstruction to a new level by generating highly realistic free-viewpoint renderings from a single video captured with a standard phone camera.
This powerful technique offers a range of video effects, including bullet time effects (temporarily freezing time while the camera moves around a scene), video stabilization, depth of field adjustments, and slow-motion capabilities.
One of the key innovations behind DynIBaR is its scalability to dynamic films with long durations, diverse scenes, unpredictable camera movements, and rapid, intricate object motions. This scalability is achieved by utilizing motion trajectory fields represented by learned basis functions, effectively modeling complex motion patterns spanning multiple frames.
To ensure temporal coherence in reconstructing dynamic scenes, DynIBaR introduces a novel temporal photometric loss that operates within motion-adjusted ray space. This loss function enhances the quality of rendered views, making them more realistic and coherent.
Additionally, the researchers recommend incorporating a new Image-Based Rendering-based motion segmentation technique within a Bayesian learning framework. This segmentation approach effectively separates dynamic and static components within the scene, contributing to an overall improvement in rendering quality.
One significant challenge in dynamic scene reconstruction lies in the computational complexity of neural networks. The number of parameters in a multilayer perceptron increases with the complexity and duration of the scene, making it challenging to train models on real-world videos.
DynIBaR addresses this challenge by directly utilizing pixel data from surrounding frames to construct new views, eliminating the need for an excessively large MLP. The foundation of DynIBaR is IBRNet, an image-based rendering method originally designed for synthesizing views in static scenes.
By building upon this foundation and introducing innovative techniques, DynIBaR looks to push the boundaries of dynamic scene reconstruction.