Digital Human and Video-Based Rendering

The commoditization of virtual and augmented reality devices and the availability of inexpensive consumer depth cameras have catalyzed a resurgence of interest in spatiotemporal performance capture. Recent systems like Fusion4D and Holoportation address several crucial problems in the real-time fusion of multiview depth maps into volumetric and deformable representations. Nonetheless, stitching multiview video textures onto dynamic meshes remains challenging due to imprecise geometries, occlusion seams, and critical time constraints. In this paper, we present a practical solution towards real-time seamless texture montage for dynamic multiview reconstruction. We build on the ideas of dilated depth discontinuities and majority voting from Holoportation to reduce ghosting effects when blending textures. In contrast to their approach, we determine the appropriate blend of textures per vertex using view-dependent rendering techniques, so as to avert fuzziness caused by the ubiquitous normal-weighted blending. By leveraging geodesics-guided diffusion and temporal texture fields, our algorithm mitigates spatial occlusion seams while preserving temporal consistency. Experiments demonstrate significant enhancement in rendering quality, especially in detailed regions such as faces. We envision a wide range of applications for Montage4D, including immersive telepresence for business, training, and live entertainment.

Publications

teaser image of Montage4D: Real-Time Seamless Fusion and Stylization of Multiview Video Textures

Montage4D: Real-Time Seamless Fusion and Stylization of Multiview Video Textures

Journal of Computer Graphics Techniques (JCGT), 2019.
Keywords: texture montage, 3d reconstruction, texture stitching, view-dependent rendering, discrete geodesics, projective texture mapping, differential geometry, temporal texture fields

teaser image of Fusing Multimedia Data Into Dynamic Virtual Environments

Fusing Multimedia Data Into Dynamic Virtual Environments

Ruofei Du
Ph.D. Dissertation, Computer Science Department., University of Maryland, College Park., 2018.
Keywords: social street view, geollery, spherical harmonics, 360 video, multiview video, montage4d, haptics, cryptography, metaverse, mirrored world
teaser image of HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Keywords: correspondences, geodesic distance, embeddings, neural networks

teaser image of Montage4D: Interactive Seamless Fusion of Multiview Video Textures

Montage4D: Interactive Seamless Fusion of Multiview Video Textures

Proceedings of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2018.
Keywords: texture montage, 3d reconstruction, texture stitching, view-dependent rendering, discrete geodesics, projective texture mapping, differential geometry, temporal texture fields

teaser image of Video Fields: Fusing Multiple Surveillance Videos Into a Dynamic Virtual Environment

Video Fields: Fusing Multiple Surveillance Videos Into a Dynamic Virtual Environment

Proceedings of the 21st International Conference on Web3D Technology (Web3D), 2016.
Keywords: virtual reality; mixed-reality; video-based rendering; projection mapping; surveillance video; WebGL; WebVR

Videos

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence


Montage4D: Real-Time Seamless Fusion and Stylization of Multiview Video Textures


Talks

Cited By

  • The Relightables: Volumetric Performance Capture of Humans With Realistic Relighting. ACM Transactions on Graphics.Kaiwen Guo, Peter Lincoln, Philip Davidson, Jay Busch, Xueming Yu, Matt Whalen, Geoff Harvey, Sergio Orts-Escolano, Rohit Pandey, Jason Dourgarian, Danhang Tang, Anastasia Tkach, Adarsh Kowdle, Emily Cooper, Mingsong Dou, Sean Fanello, Graham Fyffe, Christoph Rhemann, Jonathan Taylor, Paul Debevec, and Shahram Izadi. source | cite
  • Instant Panoramic Texture Mapping With Semantic Object Matching for Large-Scale Urban Scene Reproduction. IEEE Transactions on Visualization and Computer Graphics. Jinwoo Park, Ik-Beom Jeon, Sung-Eui Yoon, and Woontack Woo. source | cite
  • Image-Guided Neural Object Rendering. 8th International Conference on Learning Representations. Justus Thies, Michael Zollh{\"o}fer, Christian Theobalt, Marc Stamminger, and Matthias Nie{\ss}ner. source | cite
  • LookinGood: Enhancing Performance Capture With Real-Time Neural Re-Rendering. ACM Transactions on Graphics.Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, Adarsh Kowdle, Christoph Rhemann, Dan B Goldman, Cem Keskin, Steve Seitz, Shahram Izadi, and Sean Fanello. source | cite
  • A Review of Video Surveillance Systems. Journal of Visual Communication and Image Representation. Omar Elharrouss, Noor Almaadeed, and Somaya Al-Maadeed. source | cite
  • An Inexpensive Upgradation of Legacy Cameras Using Software and Hardware Architecture for Monitoring and Tracking of Live Threats. IEEE Access. Ume Habiba, Muhammad Awais, Milhan Khan, and Abdul Jaleel. source | cite
  • Spatiotemporal Retrieval of Dynamic Video Object Trajectories in Geographical Scenes. Transactions in GIS. Yujia Xie, Meizhen Wang, Xuejun Liu, Ziran Wang, Bo Mao, Feiyue Wang, and Xiaozhi Wang. source | cite
  • A Multi-Resolution Approach for Color Correction of Textured Meshes. 2018 International Conference on 3D Vision (3DV). Mohammad Rouhani, Matthieu Fradet, and Caroline Baillard. source | cite
  • IBRNet: Learning Multi-View Image-Based Rendering. CVPR 2021. Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. website, source | cite
  • Neural Body: Implicit Neural Representations With Structured Latent Codes for Novel View Synthesis of Dynamic Humans. CVPR 2021. Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. source | cite
  • Multi‐camera Video Synopsis of a Geographic Scene Based on Optimal Virtual Viewpoint. Transactions in GIS. Yujia Xie, Meizhen Wang, Xuejun Liu, Xing Wang, Yiguang Wu, Feiyue Wang, and Xiaozhi Wang. source | cite
  • Dance in the Wild: Monocular Human Animation With Neural Dynamic Appearance Synthesis. https://arxiv.org/pdf/2111.05916.pdf. Tuanfeng Y. Wang, Duygu Ceylan, Krishna Kumar Singh, and Niloy J. Mitra. source | cite
  • GeoNeRF: Generalizing NeRF With Geometry Priors. https://arxiv.org/pdf/2111.13539.pdf. Mohammad Mahdi Johari, Yann Lepoittevin, and François Fleuret. source | cite
  • Light Field Neural Rendering. https://arxiv.org/pdf/2112.09687.pdf. Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. source | cite
  • Human View Synthesis Using a Single Sparse RGB-D Input. arXiv.2112.13889. Phong Nguyen, Nikolaos Sarafianos, Christoph Lassner, Janne Heikkila, and Tony Tung. source | cite
  • VoLux-GAN: A Generative Model for 3D Face Synthesis With HDRI Relighting. Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings.Feitong Tan, Sean Fanello, Abhimitra Meka, Sergio Orts-Escolano, Danhang Tang, Rohit Pandey, Jonathan Taylor, Ping Tan, and Yinda Zhang. source | cite
  • Video\textemdashGeographic Scene Fusion Expression Based on Eye Movement Data. 2021 IEEE 7th International Conference on Virtual Reality (ICVR). Xiaozhi Wang, Yujia Xie, and Xing Wang. source | cite
  • Multi-Camera Light Field Capture : Synchronization, Calibration, Depth Uncertainty, and System Design. 6. Elijs Dima. source | cite
  • MonoMR: Synthesizing Pseudo-2.5D Mixed Reality Content From Monocular Videos. Applied Sciences. Dong-Hyun Hwang and Hideki Koike. source | cite
  • Feature Based Object Tracking: A Probabilistic Approach. Florida Institute of Technology. Kaleb Smith. source | cite
  • Reconstruction and Detection of Occluded Portions of 3D Human Body Model Using Depth Data From Single Viewpoint. U.S. Patent 10,818,078. Jie Ni and Mohammad Gharavi-Alkhansari. source | cite
  • Heterogeneous Data Fusion . U.S. Patent 11,068,756. James Browning. source | cite
  • Video Display Method and Device. CN110996087B. Feihu Luo. source | cite
  • Image Processing Module, Image Processing Method, Camera Assembly and Mobile Terminal. CN112291479A. Jingyang Chang. source | cite
  • Video Content Representation to Support the Hyper-Reality Experience in Virtual Reality. 2021 IEEE Virtual Reality and 3D User Interfaces (VR). Hyerim Park and Woontack Woo. source | cite
  • Multi-View Neural Human Rendering. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Minye Wu, Yuehao Wang, Qiang Hu, and Jingyi Yu. source | cite
  • Spatiotemporal Texture Reconstruction for Dynamic Objects Using a Single RGB-D Camera. Computer Graphics Forum. Hyomin Kim, Jungeon Kim, Hyeonseo Nam, Jaesik Park, and Seungyong Lee. source | cite
  • RealityCheck: Blending Virtual Environments With Situated Physical Reality. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Jeremy Hartmann, Christian Holz, Eyal Ofek, and Andrew Wilson. source | cite
  • High-Precision 5DoF Tracking and Visualization of Catheter Placement in EVD of the Brain Using AR. ACM Transactions on Computing for Healthcare.Xuetong Sun, Sarah B. Murthi, Gary Schwartzbauer, and Amitabh Varshney. source | cite
  • Volumetric Capture of Humans With a Single RGBD Camera Via Semi-Parametric Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Rohit Pandey, Cem Keskin, Shahram Izadi, Sean Fanello, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, and Philip Davidson. source | cite
  • SIGNET: Efficient Neural Representation for Light Fields. 2021 IEEE/CVF International Conference on Computer Vision (ICCV).Brandon Feng and Amitabh Varshney. source | cite
  • Pri3D: Can 3D Priors Help 2D Representation Learning?. https://arxiv.org/abs/2104.11225.pdf. Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, and M. Nießner. source | cite
  • Versatile Multi-Modal Pre-Training for Human-Centric Perception. arXiv.2203.13815. Fangzhou Hong, Liang Pan, Zhongang Cai, and Ziwei Liu. source | cite
  • Detection of Multicamera Pedestrian Trajectory Outliers in Geographic Scene. Wireless Communications and Mobile Computing. Wei Wang, Yujia Xie, and Xiaozhi Wang. source | cite
  • BodyMap: Learning Full-Body Dense Correspondence Map. arXiv.2205.09111. Anastasia Ianina, Nikolaos Sarafianos, Yuanlu Xu, Ignacio Rocco, and Tony Tung. source | cite
  • A Self-Occlusion Aware Lighting Model for Real-Time Dynamic Reconstruction. IEEE Transactions on Visualization and Computer Graphics. Chengwei Zheng, Wenbin Lin, and Feng Xu. source | cite
  • Scalable Neural Indoor Scene Rendering. ACM Transactions on Graphics. Xiuchao Wu, Jiamin Xu, Zihan Zhu, Hujun Bao, Qixing Huang, James Tompkin, and Weiwei Xu. source | cite
  • Detection of Multicamera Pedestrian Trajectory Outliers in Geographic Scene. Wireless Communications and Mobile Computing. Wei Wang, Yujia Xie, and Xiaozhi Wang. source | cite
  • Multi-Camera Video Synopsis of a Geographic Scene Based on Optimal Virtual Viewpoint. Transactions in GIS. Yujia Xie, Meizhen Wang, Xuejun Liu, Xing Wang, Yiguang Wu, Feiyue Wang, and Xiaozhi Wang. source | cite
  • Progressive Multi-Scale Light Field Networks. arXiv.2208.06710.David Li and Amitabh Varshney. source | cite
  • LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling. arXiv.2208.08622. Boyan Jiang, Xinlin Ren, Mingsong Dou, Xiangyang Xue, Yanwei Fu, and Yinda Zhang. source | cite
  • Stay In Touch