Nonparametric methods for change point analysis in high dimensional data

Zhang, Lupeng (2026) Nonparametric methods for change point analysis in high dimensional data. Doctoral thesis, Durham University.
Copy

Change point detection has been widely applied across different fields such as finance, engineering, genomics, and other fields. The main objective is to detect significant changes in the distribution of a data sequence. Two types of problems are studied in this thesis: the offline change point problem, where detection is performed after all data have been collected, and the online change point problem, where tests are conducted sequentially as data arrive and timely detection is crucial. Both problems have been well studied in low dimensional contexts. However, classical methods often struggle in high dimensional data where the number of variables is much larger than the number of observations. We develop nonparametric methods for both offline and online change point detection and address key challenges in high dimensional change points.

For the offline change point problem, we introduce distance-based CUSUM statistics for detecting change points in high dimensional observations. Unlike the standard CUSUM statistic which primarily detects linear changes such as shifts in the mean of observations, the distance-based CUSUM statistics are constructed based on pairwise dissimilarity distances between observations. Therefore, they are capable of detecting more general types of change points, including linear and non-linear changes in a data stream, such as changes in the mean, variance, correlation, or other changes in the shape of distribution over time. Moreover, the distance-based CUSUM method is particularly useful for HDLSS data in which the number of observations is very small but the dimension is very large. Detecting change points in such high dimensional data is an understudied problem. We study the properties of our proposed distance-based CUSUM statistics and employ them to develop a nonparametric test to determine the statistical significance of change point estimates. Our approach does not require normality or any other distribution for the data. We provide theoretical guarantees for our method and demonstrate its empirical performance in comparison with some of the recent methods via extensive simulation studies and two real data applications.


picture_as_pdf
Zhang001035499_corrected.pdf
subject
Accepted Version

View Download

EndNote Reference Manager Refer Atom Dublin Core MPEG-21 DIDL ASCII Citation HTML Citation MODS METS OpenURL ContextObject in Span OpenURL ContextObject Data Cite XML
Export