/post/index.xml Past Seminar Series - McGill Statistics Seminars
  • Structure learning for extremal graphical models

    Date: 2022-02-18

    Time: 15:30-16:30 (Montreal time)

    https://umontreal.zoom.us/j/85105423917?pwd=enM3MGpFNkZKU2daMjRITmo0N0JUUT09

    Meeting ID: 851 0542 3917

    Passcode: 403790

    Abstract:

    Extremal graphical models are sparse statistical models for multivariate extreme events. The underlying graph encodes conditional independencies and enables a visual interpretation of the complex extremal dependence structure. For the important case of tree models, we provide a data-driven methodology for learning the graphical structure. We show that sample versions of the extremal correlation and a new summary statistic, which we call the extremal variogram, can be used as weights for a minimum spanning tree to consistently recover the true underlying tree. Remarkably, this implies that extremal tree models can be learned in a completely non-parametric fashion by using simple summary statistics and without the need to assume discrete distributions, existence of densities, or parametric models for marginal or bivariate distributions. Extensions to more general graphs are also discussed.

  • Integration of multi-omics data for the discovery of novel regulators that modulate biological processes

    Date: 2022-02-11

    Time: 15:30-16:30 (Montreal time)

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    The cellular states in various biological processes such as cell differentiation, disease progression, and treatment response are often enormously complex and thus hard to be profiled with unimodal profiling (e.g., transcriptome). Although those unimodal measurements had brought success for studies in a large variety of studies, the incomplete (and often misleading) unimodal cellular profiling could lead to
    biased and inaccurate conclusions. With the development of biotechnologies, the availability of multi-omics data (bulk or single-cell) is ever-increasing. The rapid-accumulating multi-omics data offers unprecedented opportunities to accurately decode the cellular states in biological process and thus could derive a deep understanding of the change of the cellular states, crucial for finding biomarkers and therapeutic intervention strategies. In this talk, we will discuss a few multimodal methods that we developed to integrate multi-omics data for the discovery of novel regulators for multiple biological processes. Many of the novel predictions from the multimodal methods were experimentally validated and had brought new understandings of the underlying mechanisms for several diseases. I will also discuss how a potential novel COVID19 drug is discovered from such a multi-omics data integration analysis.

  • Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

    Date: 2022-02-04

    Time: 15:30-16:30 (Montreal time)

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    In this talk, we consider constructing a confidence interval for a target policy’s value offline based on pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. We show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy’s value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provides rigorous uncertainty quantification.

  • Risk assessment, heavy tails, and asymmetric least squares techniques

    Date: 2022-01-28

    Time: 15:30-16:30 (Montreal time)

    https://umontreal.zoom.us/j/93983313215?pwd=clB6cUNsSjAvRmFMME1PblhkTUtsQT09

    Meeting ID: 939 8331 3215

    Passcode: 096952

    Abstract:

    Statistical risk assessment, in particular in finance and insurance, requires estimating simple indicators to summarize the risk incurred in a given situation. Of most interest is to infer extreme levels of risk so as to be able to manage high-impact rare events such as extreme climate episodes or stock market crashes. A standard procedure in this context, whether in the academic, industrial or regulatory circles, is to estimate a well-chosen single quantile (or Value-at-Risk). One drawback of quantiles is that they only take into account the frequency of an extreme event, and in particular do not give an idea of what the typical magnitude of such an event would be. Another issue is that they do not induce a coherent risk measure, which is a serious concern in actuarial and financial applications. In this talk, after giving a leisurely tour of extreme quantile estimation, I will explain how, starting from the formulation of a quantile as the solution of an optimization problem, one may come up with two alternative families of risk measures, called expectiles and extremiles, in order to address these two drawbacks. I will give a broad overview of their properties, as well as of their estimation at extreme levels in heavy-tailed models, and explain why they constitute sensible alternatives for risk assessment using real data applications. This is based on joint work with Abdelaati Daouia, Irène Gijbels, Stéphane Girard, Simone Padoan and Antoine Usseglio-Carleve.

  • Change-point analysis for complex data structures

    Date: 2022-01-21

    Time: 15:30-16:30 (Montreal time)

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    The change-point analysis is more than sixty years old. Over this long period, it has been an important subject of interest in many scientific disciplines such as finance and econometrics, bioinformatics and genomics, climatology, engineering, and technology.

    In this talk, I will provide a general overview of the topic alongside some historical notes. I will then review the most recent and transformative advancements on the subject. Finally, I will discuss the change-point methodologies that my research team has developed over the past several years, covering various complex data structures.

  • Adventures with Partial Identifications in Studies of Marked Individuals

    Date: 2021-11-26

    Time: 15:30-16:30 (Montreal time)

    Zoom Link

    Meeting ID: 939 8331 3215

    Passcode: 096952

    Abstract:

    Monitoring marked individuals is a common strategy in studies of wild animals (referred to as mark-recapture or capture-recapture experiments) and hard to track human populations (referred to as multi-list methods or multiple-systems estimation). A standard assumption of these techniques is that individuals can be identified uniquely and without error, but this can be violated in many ways. In some cases, it may not be possible to identify individuals uniquely because of the study design or the choice of marks. Other times, errors may occur so that individuals are incorrectly identified. I will discuss work with my collaborators over the past 10 ye ars developing methods to account for problems that arise when are only individuals are only partially identified. I will present theoretical aspects of this research, including an introduction to the latent multinomial model and algebraic statistics, and also describe applications to studies of species ranging from the golden mantella (an endangered frog endemic to Madagascar measuring only 20 mm) to the whale shark (the largest known species of sh measuring up to 19 m).

  • Prediction of Bundled Insurance Risks with Dependence-aware Prediction using Pair Copula Construction

    Date: 2021-11-19

    Time: 15:30-16:30 (Montreal time)

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    We propose a dependence-aware predictive modeling framework for multivariate risks stemmed from an insurance contract with bundling features – an important type of policy increasingly offered by major insurance companies. The bundling feature naturally leads to longitudinal measurements of multiple insurance risks. We build a novel predictive model that actively exploits the dependence among the evolution of multivariate repeated risk measurements. Specifically, the longitudinal measurement of each individual risk is first modeled using pair copula construction with a D-vine structure, and the multiple D-vines are then integrated by a flexible copula. While our analysis mainly focuses on the claim count as the measurement of insurance risk, the proposed model indeed provides a unified modeling framework that can accommodate different scales of measurements, including continuous, discrete, and mixed observations. A computationally efficient sequential method is proposed for model estimation and inference, and its performance is investigated both theoretically and via simulation studies. In the application, we examine multivariate bundled risks in multi-peril property insurance using the proprietary data obtained from a commercial property insurance provider. The proposed predictive model is found to provide improved decision making for several key insurance operations, including risk segmentation and risk management. In the underwriting operation, we show that the experience rate priced by the proposed model leads to a 9% lift in the insurer’s profit. In the reinsurance operation, we show that the insurer underestimates the risk of the retained insurance portfolio by 10% when ignoring the dependence among bundled insurance risks.

  • Variational Bayes for high-dimensional linear regression with sparse priors

    Date: 2021-11-12

    Time: 15:30-16:30 (Montreal time)

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    A core problem in Bayesian statistics is approximating difficult to compute posterior distributions. In variational Bayes (VB), a method from machine learning, one approximates the posterior through optimization, which is typically faster than Markov chain Monte Carlo. We study a mean-field (i.e. factorizable) VB approximation to Bayesian model selection priors, including the popular spike-and-slab prior, in sparse high-dimensional linear regression. We establish convergence rates for this VB approach, studying conditions under which it provides good estimation. We also discuss some computational issues and study the empirical performance of the algorithm.

  • Opinionated practices for teaching reproducibility: motivation, guided instruction and practice

    Date: 2021-10-29

    Time: 15:30-16:30 (Montreal time)

    Zoom Link

    Meeting ID: 939 8331 3215

    Passcode: 096952

    Abstract:

    In the data science courses at the University of British Columbia, we define data science as the study, development and practice of reproducible and auditable processes to obtain insight from data. While reproducibility is core to our definition, most data science learners enter the field with other aspects of data science in mind, for example predictive modelling, which is often one of the most interesting topic to novices. This fact, along with the highly technical nature of the industry standard reproducibility tools currently employed in data science, present out-ofthe gate challenges in teaching reproducibility in the data science classroom. Put simply, students are not as intrinsically motivated to learn this topic, and it is not an easy one for them to learn. What can a data science educator do? Over several iterations of teaching courses focused on reproducible data science tools and workflows, we have found that providing extra motivation, guided instruction and lots of practice are key to effectively teaching this challenging, yet important subject. Here we present examples of how we deeply motivate, effectively guide and provide ample practice opportunities to data science students to effectively engage them in learning about this topic.

  • Model-assisted analyses of cluster-randomized experiments

    Date: 2021-10-22

    Time: 15:30-16:30 (Montreal time)

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    Cluster-randomized experiments are widely used due to their logistical convenience and policy relevance. To analyze them properly, we must address the fact that the treatment is assigned at the cluster level instead of the individual level. Standard analytic strategies are regressions based on individual data, cluster averages, and cluster totals, which differ when the cluster sizes vary. These methods are often motivated by models with strong and unverifiable assumptions, and the choice among them can be subjective. Without any outcome modeling assumption, we evaluate these regression estimators and the associated robust standard errors from a design-based perspective where only the treatment assignment itself is random and controlled by the experimenter. We demonstrate that regression based on cluster averages targets a weighted average treatment effect, regression based on individual data is suboptimal in terms of efficiency, and regression based on cluster totals is consistent and more efficient with a large number of clusters. We highlight the critical role of covariates in improving estimation efficiency, and illustrate the efficiency gain via both simulation studies and data analysis. Moreover, we show that the robust standard errors are convenient approximations to the true asymptotic standard errors under the design-based perspective. Our theory holds even when the outcome models are misspecified, so it is model-assisted rather than model-based. We also extend the theory to a wider class of weighted average treatment effects.