Learning Realistic Asset-Oriented Human Interaction and Animation

Loading...
Thumbnail Image

Date

Authors

Wang, Rong

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Modelling diverse and complex human activities is a fundamental research topic in computer vision and graphics, as it enables a wide range of applications such as extended reality, digital humans, video game and movie creation. In many realworld scenarios, human activities often involve additional object assets, for example, interactions with handheld items such as tools or sports equipment, or animations of avatars wearing complex apparel like coats or dresses. Meanwhile, generating such interaction and animation scenes conditioned on the given assets in a realistic manner can be challenging, as it requires in-depth understanding of the spatial and temporal relationships between the humans and assets. In this thesis, we tackle this challenge of generating asset-oriented human interaction and animation at a higher level of realism, offering an immersive user experience for downstream applications. In the first part of the thesis, we focus on the task of interacting hand-object pose estimation. To ensure that the reconstructed hand and object poses conform to contact dynamics, we develop novel physics-inspired network architectures by explicitly incorporating contact heuristics as an inductive bias in the network design. In the first work, we reformulate hand-object interaction modelling from previous approaches that encode sparse keypoint-level dependencies to a new representation based on actual surface-level contact points. This allows us to capture proximity relationships between hand and object vertices using dense graph attention, enabling the network to automatically infer plausible contact regions. To further enforce real-world physical constraints, in our second work, we extend to incorporate more complex contact heuristics, i.e. stable grasping against gravity. Since contact dynamics are well implemented in modern physics simulators, we propose to distill their knowledge from simulation and transfer it to the base pose estimation model, which is the key for reconstructing simulation-aligned stable configurations. In the second part of the thesis, we investigate effective approaches for generating realistic animations of clothed human avatars. Since human bodies and apparels exhibit distinct dynamical properties, our key insight is to decompose the animation of a holistic clothed human avatar into independent, interpretable components that can be modelled and supervised separately. In this first work, we begin with stylised characters wearing complex accessories, where we animate their rigid body motion and non-rigid apparel motion via skinned deformation and auto-regressive vertex displacement respectively, allowing us to produce visually appealing dynamical effects for the apparels. Next, we extend this approach to real humans reconstructed from casually captured phone photos. Specifically, we develop a universal clothed human model that jointly decomposes personalised avatar shapes, skinning weights, and pose-dependent cloth deformations, achieving superior model robustness and generalisability. Finally, we apply the proposed decomposition principle to improve cloth animation generation with fine-grained dynamics such as wrinkles and folds. In this last work, we represent cloth motion via decomposed low-frequency posed shapes and high-frequency wrinkle details, which mitigates spectral bias in neural networks and facilitates the learning of high-fidelity cloth movements. In summary, this thesis presents five works aimed at improving the reconstruction realism of asset-oriented human interaction and animation. We validate our methods across multiple applications, including interacting hand-object pose estimation, 3D motion transfer, and animatable clothed human reconstruction. Extensive experiments demonstrate that our approaches achieve superior fidelity and plausibility compared to existing methods, advancing the state of the art in understanding human behaviours involving diverse real-world assets.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

2026-03-17