/post/index.xml Past Seminar Series - McGill Statistics Seminars
  • Analytical and experimental design frameworks for single-cell CRISPR screens

    Date: 2026-04-17

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/86146204241

    Meeting ID: 861 4620 4241

    Passcode: None

    Abstract:

    This talk presents two complementary methodological frameworks for improving single-cell CRISPR screens, from both the analysis and study-design perspectives. The first, spaCRT, addresses a key inferential challenge in single-cell data: gene expression measurements are often sparse and noisy, so standard asymptotic tests can miscalibrate significance while resampling methods, though more reliable, are often too slow at scale. spaCRT overcomes this by using saddlepoint approximations to provide a closed-form approximation to the resampling p-value, yielding accurate error control, competitive power, and substantial computational savings. The second, PerturbPlan, addresses the design side of these experiments: because CRISPR screens are expensive, experimental choices with similar budgets can differ greatly in statistical power. PerturbPlan uses an analytic power formula, validated through simulations and real datasets, to provide near-instant power estimates and generate cost-aware, power-optimized designs across a broad range of common study settings. Together, these frameworks aim to make single-cell CRISPR studies both more statistically reliable and more efficiently designed.

  • Tales from the Tails: Extreme Value Inference for Systemic Risk

    Date: 2026-04-10

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/84830806641

    Meeting ID: 848 3080 6641

    Passcode: None

    Abstract:

    From a macroprudential view, systemic risk arises from the gradual buildup of financial imbalances across the system. In probabilistic terms, these vulnerabilities manifests in the extremal dependence structure of the system. In this talk, I present an extreme value framework for characterizing extremal dependence in multivariate distributions based on tail expansions of copulas. This framework yields a new approach to Conditional Value-at-Risk (CoVaR), one of the most widely used measures of systemic risk. Our work characterizes the possible tail regimes of CoVaR through the limiting behavior of the copula conditional distribution and proves that these regimes can be determined by the joint tail expansions of the copula. Building on this characterization, we also propose a minimum-distance estimation approach for CoVaR and establishes its asymptotic properties. The talk also features an empirical study of systemic risk in the U.S. market from 2000 to 2025. It shows how the proposed methodology can reveal changes in systemic risk, and help distinguish the systemic roles of different assets and institutions. The findings have useful implications for macroprudential surveillance and risk management.

  • On copula-based regression models: from classical regression approaches to mixed models using factor copulas

    Date: 2026-03-13

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82187303482

    Meeting ID: 821 8730 3482

    Passcode: None

    Abstract:

    In this talk, we propose a copula-based framework for regression modelling in which the conditional distribution of the response variable, given covariates, is specified through a parametric family of continuous or discrete distributions. For mixed models, we incorporate cluster-level dependence by introducing a common latent factor modeled via a factor copula. We discuss the estimation of both the copula parameters and the marginal parameters, and we derive the asymptotic behavior of the resulting estimators. Numerical experiments are performed to assess the precision of the estimators for finite samples. An example of an application is given using COVID-19 vaccination hesitancy from several countries. This is a joint work with Pavel Krupskii and Bruno Remillard.

  • Hierarchical Clustering With Confidence

    Date: 2026-02-20

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82441217734

    Meeting ID: 824 4121 7734

    Passcode: None

    Abstract:

    Hierarchical clustering is one of the most widely used approaches for exploring data. However, its greedy nature makes it highly sensitive to small perturbations, blurring the lines between genuine structure and spurious patterns.  In this work, we show how randomizing hierarchical clustering can be useful not just for assessing clustering stability but also for designing valid hypothesis testing procedures based on clustering results.  In particular, we propose a method for constructing a valid p-value at each node of the hierarchical clustering dendrogram that quantifies evidence against performing the greedy merge.  Furthermore, we show how our p-values can be used to estimate the number of clusters, with a probabilistic guarantee on overestimation of the number of clusters.  This is joint work with Di Wu and Snigdha Panigrahi.

  • Asymptotic Behavior, Risk Measures, and Simulation of Distorted Copulas

    Date: 2026-02-13

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/81379129957

    Meeting ID: 813 7912 9957

    Passcode: None

    Abstract:

    Distorting multivariate distributions is a useful approach for introducing flexibility and capturing model uncertainty. In particular, applying distortions to the copulas representing the underlying dependence structure allows one to generate new, flexible dependence models from existing ones. In this presentation, we investigate the extremal domain of attraction problem for Morillas-type distorted copulas. We establish not only conditions under which such copula-to-copula transformations alter the respective asymptotic behavior, but also discuss conditions under which the distorted copulas remain in the same domain of attraction as the initial undistorted copula. Furthermore, we discuss the effect of these distortions on multivariate risk measures, such as the lower-orthant Value-at-Risk and Range-Value-at-Risk. Finally, we propose a simulation algorithm for Morillas-type distorted copulas, addressing a gap in the literature and providing the means to utilize these modified dependence structures in practice. We end the presentation with an application of distorted copula models for hail insurance.

  • Survival analysis of extreme events with missing observations

    Date: 2026-01-23

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82195728045

    Meeting ID: 821 9572 8045

    Passcode: None

    Abstract:

    The analysis of extreme wave surge heights in Atlantic Canada is key in determining areas that are subject to flooding or at risk of severe damage from intense storms. One method for modelling extreme events is through the block maxima approach which divides a series of observations into equal-sized blocks to extract the maxima after which inference is conducted on the generalized extreme value (GEV) distribution. When observations at the series level are missing, the observed block maxima may not correspond to the true block maxima. In this presentation, we introduce this missing data problem in the context of an extreme value analysis and explain how concepts from survival analysis can be used to improve inferences on the GEV distribution using the observed block maxima.

  • A General Framework for Testing Clustering Significance and Variable-Level Inference in High-Dimensional Data

    Date: 2026-01-16

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89692052783

    Meeting ID: 896 9205 2783

    Passcode: None

    Abstract:

    Clustering is a fundamental tool for uncovering heterogeneity in data, yet a longstanding challenge lies in assessing whether detected clusters represent genuine structure or arise from sampling variability, and in determining which variables drive the clustering structure. Statistical significance clustering (SigClust; Liu et al. (2008)) addresses the first challenge by testing the cluster index under a Gaussian null, estimating its distribution via Monte Carlo simulation in high dimensions. We propose SigClust-DE, which builds on recent advances in high-dimensional covariance estimation to improve the accuracy of SigClust and extends it to variable-level inference. In particular, SigClust-DE unifies clustering significance testing and differential expression (DE) analysis, a central task in RNA-seq studies. By leveraging the Monte Carlo framework, our method controls type I error while maintaining high power for variable selection. Through extensive simulations and an application to RNA-seq data, we show that SigClust-DE achieves more accurate covariance estimation, effectively controls false discoveries, and substantially improves power in detecting differentially expressed variables, providing a general framework for clustering significance and variable-level inference in high-dimensional data.

  • Unfolding Generalized Shannon’s Entropy

    Date: 2025-12-05

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/83026954715

    Meeting ID: 830 2695 4715

    Passcode: None

    Abstract:

    Shannon’s entropy is a cornerstone of information theory, quantifying uncertainty within a probability distribution. However, the classical definition may fail for distributions with heavy tails or infinite alphabets, leaving gaps in its theoretical foundation. This talk introduces a framework called Generalized Shannon’s Entropy (GSE), which extends the original concept to ensure well-definedness and robustness under broader conditions.

  • Deep P-Spline: Theory, Fast Tuning, and Application

    Date: 2025-11-28

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/86339405056

    Meeting ID: 863 3940 5056

    Passcode: None

    Abstract:

    Deep neural networks (DNNs) have become a standard tool for tackling complex regression problems, yet identifying an optimal network architecture remains a fundamental challenge. In this work, we connect neuron selection in DNNs with knot placement in basis expansion methods. Building on this connection, we propose a difference-penalty approach that automates knot selection and, in turn, simplifies the process of choosing neurons. We call this method Deep P-Spline (DPS). This approach extends the class of models considered in conventional DNN modeling and forms the basis for a latent-variable modeling framework using the Expectation–Conditional Maximization (ECM) algorithm for efficient network structure tuning with theoretical guarantees. From the perspective of nonparametric regression, DPS alleviates the curse of dimensionality, allowing effective analysis of high-dimensional data where conventional methods often fail. These properties make DPS particularly well suited for applications such as computer experiments and image data analysis, where regression tasks routinely involve a large number of inputs. Numerical studies demonstrate the strong performance of DPS, underscoring its potential as a powerful tool for advanced nonlinear regression problems.

  • Can uncertainty be quantified? On confident hallucinations in deep learning-based methods for inverse problems

    Date: 2025-11-14

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/82687773039

    Meeting ID: 826 8777 3039

    Passcode: None

    Abstract:

    Deep learning is currently transforming how inverse problems arising in imaging reconstruction are solved. However, it is increasingly well-known that such deep learning-based methods are susceptible to hallucinations. In this talk, I will present a series of theoretical explanations for why hallucinations occur, in both deterministic and statistical estimators. I will conclude by observing that hallucinations can only be avoided by careful design of the forwards operator in tandem with the recovery algorithm, and then provide a theoretical framework for how this can be achieved when solving inverse problems using generative models.