/post/index.xml Past Seminar Series - McGill Statistics Seminars
  • Hierarchical Clustering With Confidence

    Date: 2026-02-20

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82441217734

    Meeting ID: 824 4121 7734

    Passcode: None

    Abstract:

    Hierarchical clustering is one of the most widely used approaches for exploring data. However, its greedy nature makes it highly sensitive to small perturbations, blurring the lines between genuine structure and spurious patterns.  In this work, we show how randomizing hierarchical clustering can be useful not just for assessing clustering stability but also for designing valid hypothesis testing procedures based on clustering results.  In particular, we propose a method for constructing a valid p-value at each node of the hierarchical clustering dendrogram that quantifies evidence against performing the greedy merge.  Furthermore, we show how our p-values can be used to estimate the number of clusters, with a probabilistic guarantee on overestimation of the number of clusters.  This is joint work with Di Wu and Snigdha Panigrahi.

  • Asymptotic Behavior, Risk Measures, and Simulation of Distorted Copulas

    Date: 2026-02-13

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/81379129957

    Meeting ID: 813 7912 9957

    Passcode: None

    Abstract:

    Distorting multivariate distributions is a useful approach for introducing flexibility and capturing model uncertainty. In particular, applying distortions to the copulas representing the underlying dependence structure allows one to generate new, flexible dependence models from existing ones. In this presentation, we investigate the extremal domain of attraction problem for Morillas-type distorted copulas. We establish not only conditions under which such copula-to-copula transformations alter the respective asymptotic behavior, but also discuss conditions under which the distorted copulas remain in the same domain of attraction as the initial undistorted copula. Furthermore, we discuss the effect of these distortions on multivariate risk measures, such as the lower-orthant Value-at-Risk and Range-Value-at-Risk. Finally, we propose a simulation algorithm for Morillas-type distorted copulas, addressing a gap in the literature and providing the means to utilize these modified dependence structures in practice. We end the presentation with an application of distorted copula models for hail insurance.

  • Survival analysis of extreme events with missing observations

    Date: 2026-01-23

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82195728045

    Meeting ID: 821 9572 8045

    Passcode: None

    Abstract:

    The analysis of extreme wave surge heights in Atlantic Canada is key in determining areas that are subject to flooding or at risk of severe damage from intense storms. One method for modelling extreme events is through the block maxima approach which divides a series of observations into equal-sized blocks to extract the maxima after which inference is conducted on the generalized extreme value (GEV) distribution. When observations at the series level are missing, the observed block maxima may not correspond to the true block maxima. In this presentation, we introduce this missing data problem in the context of an extreme value analysis and explain how concepts from survival analysis can be used to improve inferences on the GEV distribution using the observed block maxima.

  • A General Framework for Testing Clustering Significance and Variable-Level Inference in High-Dimensional Data

    Date: 2026-01-16

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89692052783

    Meeting ID: 896 9205 2783

    Passcode: None

    Abstract:

    Clustering is a fundamental tool for uncovering heterogeneity in data, yet a longstanding challenge lies in assessing whether detected clusters represent genuine structure or arise from sampling variability, and in determining which variables drive the clustering structure. Statistical significance clustering (SigClust; Liu et al. (2008)) addresses the first challenge by testing the cluster index under a Gaussian null, estimating its distribution via Monte Carlo simulation in high dimensions. We propose SigClust-DE, which builds on recent advances in high-dimensional covariance estimation to improve the accuracy of SigClust and extends it to variable-level inference. In particular, SigClust-DE unifies clustering significance testing and differential expression (DE) analysis, a central task in RNA-seq studies. By leveraging the Monte Carlo framework, our method controls type I error while maintaining high power for variable selection. Through extensive simulations and an application to RNA-seq data, we show that SigClust-DE achieves more accurate covariance estimation, effectively controls false discoveries, and substantially improves power in detecting differentially expressed variables, providing a general framework for clustering significance and variable-level inference in high-dimensional data.

  • Unfolding Generalized Shannon’s Entropy

    Date: 2025-12-05

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/83026954715

    Meeting ID: 830 2695 4715

    Passcode: None

    Abstract:

    Shannon’s entropy is a cornerstone of information theory, quantifying uncertainty within a probability distribution. However, the classical definition may fail for distributions with heavy tails or infinite alphabets, leaving gaps in its theoretical foundation. This talk introduces a framework called Generalized Shannon’s Entropy (GSE), which extends the original concept to ensure well-definedness and robustness under broader conditions.

  • Deep P-Spline: Theory, Fast Tuning, and Application

    Date: 2025-11-28

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/86339405056

    Meeting ID: 863 3940 5056

    Passcode: None

    Abstract:

    Deep neural networks (DNNs) have become a standard tool for tackling complex regression problems, yet identifying an optimal network architecture remains a fundamental challenge. In this work, we connect neuron selection in DNNs with knot placement in basis expansion methods. Building on this connection, we propose a difference-penalty approach that automates knot selection and, in turn, simplifies the process of choosing neurons. We call this method Deep P-Spline (DPS). This approach extends the class of models considered in conventional DNN modeling and forms the basis for a latent-variable modeling framework using the Expectation–Conditional Maximization (ECM) algorithm for efficient network structure tuning with theoretical guarantees. From the perspective of nonparametric regression, DPS alleviates the curse of dimensionality, allowing effective analysis of high-dimensional data where conventional methods often fail. These properties make DPS particularly well suited for applications such as computer experiments and image data analysis, where regression tasks routinely involve a large number of inputs. Numerical studies demonstrate the strong performance of DPS, underscoring its potential as a powerful tool for advanced nonlinear regression problems.

  • Can uncertainty be quantified? On confident hallucinations in deep learning-based methods for inverse problems

    Date: 2025-11-14

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82687773039

    Meeting ID: 826 8777 3039

    Passcode: None

    Abstract:

    Deep learning is currently transforming how inverse problems arising in imaging reconstruction are solved. However, it is increasingly well-known that such deep learning-based methods are susceptible to hallucinations. In this talk, I will present a series of theoretical explanations for why hallucinations occur, in both deterministic and statistical estimators. I will conclude by observing that hallucinations can only be avoided by careful design of the forwards operator in tandem with the recovery algorithm, and then provide a theoretical framework for how this can be achieved when solving inverse problems using generative models.

  • Towards Efficient and Reliable Generative and Sampling Models

    Date: 2025-11-07

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/87181846336

    Meeting ID: 871 8184 6336

    Passcode: None

    Abstract:

    This talk presents a unified framework for enhancing the reliability and geometric fidelity of generative models. We first develop a diffusion mechanism defined intrinsically on the SE(3) manifold, enabling the efficient sampling. To address the critical issue of mode collapse in energy-based samplers, we introduce a novel Importance Weighted Score Matching method that provably improves coverage of complex, multi-modal distributions. Finally, we extend these principles to infer underlying dynamical systems directly from incomplete and scattered training data. Collectively, this work bridges geometric consistency, statistical reliability, and learning from partial observations to advance the frontiers of generative and sampling models.

  • Regularized Fine-Tuning for Representation Multi-Task Learning: Adaptivity, Minimaxity, and Robustness

    Date: 2025-10-24

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/81872329544

    Meeting ID: 818 7232 9544

    Passcode: None

    Abstract:

    We study multi-task linear regression for a collection of tasks that share a latent, low-dimensional structure. Each task’s regression vector belongs to a subspace whose dimension, denoted intrinsic dimension, is much smaller than the ambient dimension. Unlike classical analyses that assume an identical subspace for every task, we allow each task’s subspace to drift from a single reference subspace by a controllable similarity radius, and we permit an unknown fraction of tasks to be outliers that violate the shared-structure assumption altogether. Our contributions are threefold. First, adaptivity: we design a penalized empirical-risk algorithm and a spectral method.  Both algorithms automatically adjust to the unknown similarity radius and to the proportion of outliers. Second, minimaxity: we prove information-theoretic lower bounds on the best achievable prediction risk over this problem class and show that both algorithms attain these bounds up to constant factors; when no outliers are present, the spectral method is exactly minimax-optimal. Third, robustness: for every choice of similarity radius and outlier proportion, the proposed estimators never incur larger expected prediction error than independent single-task regression, while delivering strict improvements whenever tasks are even moderately similar and outliers are sparse. Additionally, we introduce a thresholding algorithm to adapt to an unknown intrinsic dimension. We conduct extensive numerical experiments to validate our theoretical findings.

  • K-contact Distance for Noisy Nonhomogeneous Spatial Point Data and Application to Repeating Fast Radio Burst Sources

    Date: 2025-10-10

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/81986712072

    Meeting ID: 819 8671 2072

    Passcode: None

    Abstract:

    In this talk, I’ll introduce an approach to analyze nonhomogeneous Poisson processes (NHPP) observed with noise which focuses on previously unstudied second-order characteristics of the noisy process. Utilizing a hierarchical Bayesian model with noisy data, we first estimate hyperparameters governing a physically motivated NHPP intensity. Leveraging the posterior distribution, we then infer the probability of detecting a certain number of events within a given radius, the $k$-contact distance. This methodology is demonstrated by its motivating application: observations of fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Mapping Experiment’s FRB Project (CHIME/FRB). The approach allows us to identify repeating FRB sources by computing the probability of observing $k$ physically independent sources within some radius in the detection domain, or the probability of coincidence ($P_C$). Applied, the new methodology improves the repeater detection $P_C$, in 86% of cases when applied to the largest sample of previously classified observations, with a median improvement factor (existing metric over $P_C$ from our methodology) of ~ 3000. Throughout the talk, I will provide the necessary astrophysical context to motivate the application and highlight some of the other active statistical problems in FRB science.