/tags/2023-winter/index.xml 2023 Winter - McGill Statistics Seminars
  • Confidence sets for Causal Discovery

    Date: 2023-03-24 Time: 15:30-16:30 (Montreal time) On Zoom only https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: Causal discovery procedures are popular methods for discovering causal structure across the physical, biological, and social sciences. However, most procedures for causal discovery only output a single estimated causal model or single equivalence class of models. We propose a procedure for quantifying uncertainty in causal discovery. Specifically, we consider linear structural equation models with non-Gaussian errors and propose a procedure which returns a confidence sets of causal orderings which are not ruled out by the data.
  • Excursions in Statistical History: Highlights

    Date: 2023-03-17 Time: 15:30-16:30 (Montreal time) Hybrid: In person / Zoom Location: Burnside Hall 1104 https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: Over the last 20 years, the speaker has delved into the origins of ‘regression’; the development of the ’t’ and ‘Poisson’ distributions; forerunners of the ‘hazard’ function; and the statistical design and conduct of US Selective Service lotteries from 1917 onwards. This talk will recount the stories, data and simulations behind some of these, and provide some modern-day re-enactments.
  • Heteroskedastic Sparse PCA in High Dimensions

    Date: 2023-03-10 Time: 15:30-16:30 (Montreal time) Hybrid: In person / Zoom Location: Burnside Hall 1104 https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: Principal component analysis (PCA) is one of the most commonly used techniques for dimension reduction and feature extraction. Though it has been well-studied for high-dimensional sparse PCA, little is known when the noise is heteroskedastic, which turns out to be ubiquitous in many scenarios, like biological sequencing data and information network data.
  • High Dimensional Logistic Regression Under Network Dependence

    Date: 2023-03-10 Time: 14:15-15:15 (Montreal time) Hybrid: In person / Zoom Location: Burnside Hall 1104 https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: The classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure, such as over a temporal/spatial domain or on a social network. This necessitates the development of models that can simultaneously handle both the network peer-effect (arising from neighborhood interactions) and the effect of (possibly) high-dimensional covariates.
  • Epidemic Forecasting using Delayed Time Embedding

    Date: 2023-02-17 Time: 15:30-16:30 (Montreal time) https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: Forecasting the future trajectory of an outbreak plays a crucial role in the mission of managing emerging infectious disease epidemics. Compartmental models, such as the Susceptible-Exposed-Infectious-Recovered (SEIR), are the most popular tools for this task. They have been used extensively to combat many infectious disease outbreaks including the current COVID-19 pandemic. One downside of these models is that they assume that the dynamics of an epidemic follow a pre-defined dynamical system which may not capture the true trajectories of an outbreak.
  • Efficient Label Shift Adaptation through the Lens of Semiparametric Models

    Date: 2023-02-10 Time: 15:00-16:00 (Montreal time) Hybrid: In person / Zoom Location: Burnside Hall 1205 https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: We study the domain adaptation problem with label shift in this work. Under the label shift context, the marginal distribution of the label varies across the training and testing datasets, while the conditional distribution of features given the label is the same. Traditional label shift adaptation methods either suffer from large estimation errors or require cumbersome post-prediction calibrations.
  • Learning from a Biased Sample

    Date: 2023-02-03 Time: 15:30-16:30 (Montreal time) https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it under. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-represented relative to the general population; and in this setting empirical risk minimization over the training set may fail to yield rules that perform well at deployment.
  • What is TWAS and how do we use it in integrating gene expression data

    Date: 2023-01-20 Time: 15:30-16:30 (Montreal time) https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: The transcriptome-wide association studies (TWAS) is a pioneering approach utilizing gene expression data to identify genetic basis of complex diseases. Its core component is called “genetically regulated expression (GReX)”. GReX links gene expression information with phenotype by serving as both the outcome of genotype-based expression models and the predictor for downstream association testing. Although it is popular and has been used in many high-profile projects, its mathematical nature and interpretation haven’t been rigorously verified.
  • To split or not to split that is the question: From cross validation to debiased machine learning

    Date: 2023-01-13 Time: 15:30-16:30 (Montreal time) https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09 Meeting ID: 834 3668 6293 Passcode: 12345 Abstract: Data splitting is an ubiquitous method in statistics with examples ranging from cross validation to cross-fitting. However, despite its prevalence, theoretical guidance regarding its use is still lacking. In this talk we will explore two examples and establish an asymptotic theory for it. In the first part of this talk, we study the cross-validation method, a ubiquitous method for risk estimation, and establish its asymptotic properties for a large class of models and with an arbitrary number of folds.