Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video

1Huawei London Research Center, 2Technical University of Munich

ReCaLab is a fully-differentiable pipeline that enables the generation of high-fidelity, photorealistic 3D human avatars from a single RGB video, achieving superior novel pose rendering and intuitive text-based manipulation by decoupling albedo, shading, and pose-conditioned geometry.

Abstract

Recent advancements in 3D avatar generation excel with multi-view supervision for photorealistic models. However, monocular counterparts lag in quality despite broader applicability. We propose ReCaLab to close this gap. ReCaLab is a fully-differentiable pipeline that learns high-fidelity 3D human avatars from just a single RGB video. A pose-conditioned deformable NeRF is optimized to volumetrically represent a human subject in canonical T-pose. The canonical representation is then leveraged to efficiently associate viewpoint-agnostic textures using 2D-3D correspondences. This enables to separately generate albedo and shading which jointly compose an RGB prediction. The design allows to control intermediate results for human pose, body shape, texture, and lighting with text prompts. An image-conditioned diffusion model thereby helps to animate appearance and pose of the 3D avatar to create video sequences with previously unseen human motion. Extensive experiments show that ReCaLab outperforms previous monocular approaches in terms of image quality for image synthesis tasks. ReCaLab even outperforms multi-view methods that leverage up to 19x more synchronized videos for the task of novel pose rendering. Moreover, natural language offers an intuitive user interface for creative manipulation of 3D human avatars.

Video

BibTeX

@article{rao2023reality,
      title={Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video},
      author={Rao, Yuchen and Pellitero, Eduardo Perez and Busam, Benjamin and Zhou, Yiren and Song, Jifei},
      journal={arXiv preprint arXiv:2312.04784},
      year={2023}
    }