Natural Language Processing for Early Detection and Mitigation of Public Health Threats

FARRELL, SEAN (2025) Natural Language Processing for Early Detection and Mitigation of Public Health Threats. Doctoral thesis, Durham University.
Copy

Veterinary electronic health records (vEHRs) represent a vast yet underutilised resource with the potential to advance animal welfare, strengthen public health, and drive innovations in healthcare informatics. This thesis presents a framework for utilising vEHRs through Natural Language Processing (NLP) techniques, contributing novel methodologies and insights across five key areas. I introduce PetBERT, a foundation model trained on 500 million tokens from first-opinion vEHRs and forms the backbone of our syndromic disease surveillance system, establishing a new state-of-the-art approach for monitoring disease outbreaks within the UK, actively deployed as an early warning mechanism for emerging veterinary diseases. I present a hierarchical language model applied to 1.4 million antimicrobial prescriptions, revealing significant species-specific discrepancies in antimicrobial use and adherence to antimicrobial stewardship guidelines, offering a scalable solution for stewardship monitoring. I present a novel text-tabular explainability approach focusing on premature mortality and identifying previously unrecognised risk factors, including the significant influence of socioeconomic status on health outcomes. Recognising the importance of responsible data sharing, I developed PetHarbor, the first data governance framework for vEHRs. Working collaboratively with the international community, this framework standardises protocols for data sharing while maintaining privacy and ethical standards. Finally, I contribute PetEVAL, the first open evaluation benchmark for vEHRs, releasing 17,000 annotated records for anonymisation, disease extraction, and syndromic classification tasks. This resource enables reproducible research of this thesis, establishes vEHRs as a transformative resource for healthcare informatics, and charts a path for a standardised evaluation framework for future developments in veterinary NLP. By embedding open science at its core, this thesis demonstrates that vEHRs are not merely a neglected data source but a powerful engine for advancing animal health, tracking diseases in real-time, and informing global health policy.

visibility_off picture_as_pdf

picture_as_pdf
Sean_Farrell_Thesis_final.pdf
subject
Accepted Version
lock_clock
Restricted to Repository staff only until 1 October 2026


EndNote Reference Manager Refer Atom Dublin Core Data Cite XML OpenURL ContextObject in Span ASCII Citation HTML Citation MODS MPEG-21 DIDL METS OpenURL ContextObject
Export