Date: 2026-01-16
Time: 15:30-16:30 (Montreal time)
Location: In person, Burnside 1104
https://mcgill.zoom.us/j/89692052783
Meeting ID: 896 9205 2783
Passcode: None
Abstract:
Clustering is a fundamental tool for uncovering heterogeneity in data, yet a longstanding challenge lies in assessing whether detected clusters represent genuine structure or arise from sampling variability, and in determining which variables drive the clustering structure. Statistical significance clustering (SigClust; Liu et al. (2008)) addresses the first challenge by testing the cluster index under a Gaussian null, estimating its distribution via Monte Carlo simulation in high dimensions. We propose SigClust-DE, which builds on recent advances in high-dimensional covariance estimation to improve the accuracy of SigClust and extends it to variable-level inference. In particular, SigClust-DE unifies clustering significance testing and differential expression (DE) analysis, a central task in RNA-seq studies. By leveraging the Monte Carlo framework, our method controls type I error while maintaining high power for variable selection. Through extensive simulations and an application to RNA-seq data, we show that SigClust-DE achieves more accurate covariance estimation, effectively controls false discoveries, and substantially improves power in detecting differentially expressed variables, providing a general framework for clustering significance and variable-level inference in high-dimensional data.
Speaker
Hui Shen is a postdoctoral researcher in the Department of Mathematics and Statistics at McGill University. She received her PhD from the Department of Statistics and Operations Research at the University of North Carolina at Chapel Hill. Her research interests include high-dimensional data analysis, statistical network analysis, and differential privacy.