Towards Modelling Skeletal Human Motions via Diffusion Models

Chang, Ziyi (2026) Towards Modelling Skeletal Human Motions via Diffusion Models. Doctoral thesis, Durham University.

Copy

Skeleton-based human motion modelling has been a long-standing challenge and continues to attract strong academic and industrial interest. Human motions exhibit substantial diversity arising from inter-class variation (e.g., differences in action semantics and dynamics) and intra-class variation (e.g., differences in spatial extent and temporal pace). These complementary factors are deeply entangled, and conventional approaches struggle with limited capacity and mode coverage when trying to model human motions. This thesis aims to investigate human motion modelling problems within a unified generative framework through the lens of inter-class and intra-class variations. To examine this perspective in practice, the thesis considers three representative tasks: styled motion generation, adversarial motion generation, and multi-character interaction generation.

Styled motion generation aims to synthesizes different motion contents under diverse styles where contents are treated as inter-class features and styles are treated as intra-class features to account for the variations of human motions. In this thesis, the term, style, is employed in a motion-specific context. Motion styles denote attributes that yield systematic variations in the execution of an action, despite the underlying action class remaining the same. For instance, emotional states or age-related characteristics can introduce distinct spatial and temporal nuances in the performance of an otherwise identical action. This motion-centric definition differentiates our use of style from its broader interpretations in other disciplines, such as visual aesthetics or linguistic expression. Within this formulation, content represents inter-class semantics and dynamics, whereas style encapsulates fine-grained intra-class variations. Previous methods cannot generate styled motions in an end-to-end manner, either requiring to specify contents and/or styles to reduce the demand of jointly modelling inter-class and intra-class variations. To facilitate the end-to-end styled motion generation, a denoising diffusion probabilistic model is proposed where action classes are recognised as contents and action executions are recognised as styles. Different contents and styles are modelled jointly in the same diffusion latent space. This results in an integrated, end-to-end trained pipeline that facilitates the generation of stylized motion.

Adversarial motion generation aims to synthesize motions to mislead a system by treating high-level action semantics as inter-class features and low-level execution details as intra-class features. In terms of human motions, a classical scenario is to mislead action recognition systems for their wide applications. A diffusion model is proposed for the generation of adversarial human motions against human action recognition. The variations modelled by the diffusion model facilitates to generate adversarial motions to reveal the adversarial robustness of human action recognition. The diffusion model generates the adversarial motions from the stochastic diffusion latent space and the distributional knowledge captured by the diffusion model.

Multi-character interaction generation aims to synthesize interactions of a large number of characters, treating high-level interaction semantics as inter-class variations and coordination as intra-class variations under the context of interactions. Different from previous interaction modelling approaches that mainly focus on two characters, the coordination between multiple characters is recognised as an unique intra-class variation in the context of multi-character interactions, allowing characters to change the interaction partners. A conditional diffusion model is proposed with reinforcement learning as a framework for the generation of multi-character interactions without any multi-character dataset. The framework comprising a coordinatable multi-character interaction space for interaction synthesis and a transition planning network for coordination. The two component advances the modelling of inter-class and intra-class variations for multi-character interactions, facilitating the generation of realistic, dynamic interactions among multiple characters.

By respectively integrating diffusion models for modelling inter-class and intra-class variations of human skeletal motions through stylized motions, adversarial motions, and interactive motions, this thesis demonstrates significant improvements in unleashing the potential of generative diffusion models in human motion modelling. The demonstrated results hold promising potential for further applications in diverse human-centric motion-based artificial intelligence such as behaviour diagnostics, physical rehabilitation, and responsible systems. Most of the works have been recognized in peer-reviewed conferences, underscoring their impacts and contributions to the field.

Item Type	Thesis (Doctoral)
Divisions	Faculty of Science > Computer Science, Department of
Date Deposited	30 Mar 2026 14:31
Last Modified	31 Mar 2026 05:38

picture_as_pdf: Ziyi_Chang_000993794.pdf
subject: Accepted Version

View

Download

EndNote

Reference Manager

Refer

Atom

Dublin Core

OpenURL ContextObject in Span

METS

Data Cite XML

MPEG-21 DIDL

ASCII Citation

HTML Citation

OpenURL ContextObject

MODS

Export