On Hierarchical Encoding and Reasoning in Deep Transformer-based Generative Models

SLACK, DEAN LEWIS (2025) On Hierarchical Encoding and Reasoning in Deep Transformer-based Generative Models. Doctoral thesis, Durham University.
Copy

Recent advances in generative Transformer-based foundation models have driven remarkable progress in artificial intelligence, yet their internal mechanisms for representing complex hierarchical structures remain largely unknown, posing significant challenges for interpretability, safety, and robust generalisation. This thesis aims to progress on these issues by systematically investigating how such models internalise hierarchical structures, the relationship between this learning and behaviours like generalisation versus memorisation, and how hierarchical principles can inform the development of safer, more accurate, generative models. To this end, we first introduce novel probing techniques to map the layer-wise emergence of linguistic hierarchies in language models and extend this analysis to the visual domain by developing PSViT: a pixel-space Transformer with hierarchical decompositions of video image patches, shown to learn and generalise hierarchical physical dynamics from raw video data. We investigate memorisation during fine-tuning, establishing an n-gram based early warning signal for verbatim leakage and proposing scalable defences that promote structural generalisation over verbatim memorisation. Building on these insights, we further demonstrate that a unified next-frame prediction framework enables a single model to process text, images, audio, and video without modality-specific encoders, thereby learning shared hierarchical patterns across these diverse inputs. Collectively, our findings underscore that the capacity to learn and represent hierarchical structure is a fundamental characteristic of Transformer models, and that a focused analysis of these underpinnings is crucial for advancing more capable, interpretable, and safer artificial intelligence.


picture_as_pdf
deanlewisslack_phd_thesis.pdf
subject
Accepted Version

View Download

EndNote Reference Manager Refer Atom Dublin Core Data Cite XML OpenURL ContextObject in Span ASCII Citation HTML Citation MODS MPEG-21 DIDL METS OpenURL ContextObject
Export