/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Reduced-Rank Envelope Vector Autoregressive Models

    Date: 2023-11-03

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/2571023554

    Meeting ID: 257 102 3554

    Passcode: None

    Abstract:

    Classical vector autoregressive (VAR) models have long been a popular choice for modeling multivariate time series data due to their flexibility and ease of use. However, the VAR model suffers from overparameterization which is a serious issue for high-dimensional time series data as it restricts the number of variables and lags that can be incorporated into the model. Several statistical methods have been proposed to achieve dimension reduction in the parameter space of VAR models. Yet, these methods prove inefficient in extracting relevant information from complex datasets, as they fail to distinguish between information aligned with scientific objectives and are also inefficient in addressing rank deficiency problems. Envelope methods, founded on novel parameterizations that employ reduced subspaces to establish connections between the mean function and covariance matrix, offer a solution by efficiently identifying and eliminating irrelevant information. In this presentation, we introduce a new, parsimonious VAR model that incorporates the concept of envelope models into the reduced-rank VAR framework that can achieve substantial dimension reduction and efficient parameter estimation. We will present the results of simulation studies and real data analysis comparing the performance of our proposed model with that of existing models in the literature.

  • Doubly Robust Estimation under Covariate-induced Dependent Left Truncation

    Date: 2023-10-27

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/84195498572

    Meeting ID: 841 9549 8572

    Passcode: None

    Abstract:

    In prevalent cohort studies with follow-up, the time-to-event outcome is subject to left truncation leading to selection bias. For estimation of the distribution of time-to-event, conventional methods adjusting for left truncation tend to rely on the (quasi-)independence assumption that the truncation time and the event time are “independent" on the observed region. This assumption is violated when there is dependence between the truncation time and the event time possibly induced by measured covariates. Inverse probability of truncation weighting leveraging covariate information can be used in this case, but it is sensitive to misspecification of the truncation model. In this work, we apply the semiparametric theory to find the efficient influence curve of an expected (arbitrarily transformed) survival time in the presence of covariate-induced dependent left truncation. We then use it to construct estimators that are shown to enjoy double-robustness properties. Our work represents the first attempt to construct doubly robust estimators in the presence of left truncation, which does not fall under the established framework of coarsened data where doubly robust approaches are developed. We provide technical conditions for the asymptotic properties that appear to not have been carefully examined in the literature for time-to-event data, and study the estimators via extensive simulation. We apply the estimators to two data sets from practice, with different right-censoring patterns.

  • Neural network architectures for functional data analysis

    Date: 2023-10-20

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89761165882

    Meeting ID: 897 6116 5882

    Passcode: None

    Abstract:

    Functional data is defined as any random variables that assume values in an infinite precision domain, such as time or space. In applications, this data is usually discretely observed at some regularly or irregularly-spaced points over the domain. In this talk, we discuss ways to adapt modern neural network architectures for the analysis of functional data. To do so, we design new neural network layers in order to process functional data either as input, output or both. First, we propose the functional output layer, which can be used to solve a multitude of function-on-scalar regression problems in a non-linear way. The proposed layer provides a smooth representation of the output and we demonstrate how to regularize such a layer during the network training phase. Second, we propose a concept for functional weights that project functional data to a scalar representation, leading to a novel formulation for a functional input layer. We demonstrate how to combine both of these proposed functional layers to create a functional autoencoder. This model takes as input the data in the form it is usually collected, as discrete points over the domain, and can be used for feature extraction and functional data smoothing. We demonstrate the benefits of the proposed architectures with various experiments on simulated data and real data applications. We conclude with a brief discussion of ongoing work in the design of a functional convolution layer that bridges the gap between the discrete convolution operation and its continuous counterpart.

  • Distances on and between complex networks

    Date: 2023-10-13

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/83477865796

    Meeting ID: 834 7786 5796

    Passcode: None

    Abstract:

    Distance plays a pivotal role in statistics. Meanwhile, recent technologies and social networks have yielded large complex network data sets, which require customized statistical tools. From a mathematical viewpoint, these complex networks are graphs with non-trivial structures (in contrast to Erdös-Rényi graphs, for example). These networks are models of systemic phenomena and cases where individual-level analyses are insufficient. Such models are not only used in the study of social networks, but are also widely employed in neurology, biology, telecommunication and finance, among many areas of application. Unfortunately, however, distances on graphs are not clearly defined.

  • Doubly robust inference under possibly misspecified marginal structural Cox model

    Date: 2023-09-29

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/82440807026

    Meeting ID: 824 4080 7026

    Passcode: None

    Abstract:

    Doubly robust estimation under the marginal structural Cox model has been a challenge until recently due to the non-collapsibility of the Cox regression model. This is because the estimand of causal hazard ratio assumes that the marginal structural Cox model holds, while the doubly robust estimating function requires the specification of an additional model for the conditional distribution of the time-to-event given treatment and covariates, both models unlikely to hold simultaneously. It became possible recently to resolve this issue with the understanding of rate double robustness and machine learning or nonparametric approaches, although technical details are still to be spelt out to ensure root-n inference for the estimand. We describe our work considering both observational studies setting and in the presence of covariate-induced informative censoring. An added benefit of our approach is the interpretation of the estimand when the assumed marginal structural Cox model does not hold, as a time-averaged treatment effect. This allows meaningful estimation of treatment effects for general two-group comparison without the Cox model, or under alternative models such as the semiparametric proportional odds or transformation models for the potential time-to-event outcomes.

  • Detection of Multiple Influential Observations on Variable Selection for High-dimensional Data: New Perspective with an Application to Neurologic Signature of Physical Pain.

    Date: 2023-09-22

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89374813252

    Meeting ID: 893 7481 3252

    Passcode: None

    Abstract:

    Influential diagnosis is an integral part of data analysis, of which most existing methodological frameworks presume a deterministic submodel and are designed for low-dimensional data (i.e., the number of predictors $p$ smaller than the sample size $n$). However, the stochastic selection of a submodel from high-dimensional data where $p$ exceeds $n$ has become ubiquitous. Thus, methods for identifying observations that could exert undue influence on the choice of a submodel can play an important role in this setting. To date, discussion of this topic has been limited, falling short in two domains: (1) constrained ability to detect multiple influential points, and (2) applicability only in restrictive settings. In this talk, building on a recently proposed measure, we introduce a generalized version accommodating different model selectors, the asymptotic property of which is subsequently examined for large $p$. The $K$-means clustering is incorporated into our scheme to detect multiple influential points. Simulation is then conducted to assess the performances of various diagnostic approaches. The proposed procedure further demonstrates its value in improving predictive power when analyzing thermal-stimulated pain based on fMRI data. In addition, the latest development revolving around this newly proposed measure is also presented. This work is conducted under the joint supervision of Professors Masoud Asgharian and Martin Lindquist.

  • Three Myths About Causal Mediation

    Date: 2023-09-15

    Time: 15:30-16:30 (Montreal time)

    Location: Burnside 1104

    https://mcgill.zoom.us/j/86404798712

    Meeting ID: 864 0479 8712

    Passcode: None

    Abstract:

    Causal mediation techniques are a means for identifying the degree to which a cause influences its effect along particular causal paths. For example, in a model where a cause influences its effect both indirectly via a mediator and directly via factors not included in the model, mediation techniques enable one to measure both direct and indirect effects. Although mediation techniques are widely employed, they are often misunderstood. This is in part due to the long-term influence of Baron and Kenny’s (1986) treatment of mediation, which applies only to linear models without interaction, and which leads one to develop intuitions about direct and indirect effects that do not generalize to non-parametric causal models. In my talk, I identify and reject three persistent myths about mediation. I argue that such methods: 1. Should not be understood as decomposing the total effect into additive components corresponding to the contributions of the paths; 2. Are not a means for eliminating latent heterogeneity; and 3. Do not require one to appeal to causal concepts other than the counterfactual causal ones built into structural causal models. These points are crucial for understanding mediation effects in any contexts in which they are studied, and have particular applications for studies of fairness and discrimination, in which such effects play an increasingly central role (Plečko and Bareinboim, 2022).

  • Empirical Bayes Control of the False Discovery Exceedance

    Date: 2023-08-17

    Time: 15:30-16:30 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/89623344755?pwd=S1E0QWVjSm8wRHdIYU5IZzllSXNjUT09

    Meeting ID: 896 2334 4755

    Passcode: 287381

    Abstract:

    In sparse large-scale testing problems where the false discovery proportion (FDP) is highly variable, the false discovery exceedance (FDX) provides a valuable alternative to the widely used false discovery rate (FDR). We develop an empirical Bayes approach to controlling the FDX. We show that for independent hypotheses from a two-group model and dependent hypotheses from a Gaussian model fulfilling the exchangeability condition, an oracle decision rule based on ranking and thresholding the local false discovery rate (lfdr) is optimal in the sense that the power is maximized subject to FDX constraint. We propose a data-driven FDX procedure that emulates the oracle via carefully designed computational shortcuts. We investigate the empirical performance of the proposed method using simulations and illustrate the merits of FDX control through an application for identifying abnormal stock trading strategies.

  • Residual-based estimation of parametric copulas under regression

    Date: 2023-08-14

    Time: 15:30-16:30 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    We study a multivariate response regression model where each coordinate is described by a location-scale regression, and where the dependence structure of the “noise” terms in the regression is described by a parametric copula. Our goal is to estimate the associated Euclidean copula parameter given a sample of the response and the covariate. In the absence of the copula sample, the oracle ranks in the usual pseudo-likelihood estimation procedure are no longer computable. Instead, we base our estimation on the residual ranks calculated from some preliminary estimators of the regression functions. We show that the residual-based estimators are asymptotically equivalent to their oracle counterparts, even when the dimension of the covariate in the regression is moderately diverging. Partially to serve this objective, we also study the weighted convergence of the residual empirical processes.

  • Confidence sets for Causal Discovery

    Date: 2023-03-24

    Time: 15:30-16:30 (Montreal time)

    On Zoom only

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    Causal discovery procedures are popular methods for discovering causal structure across the physical, biological, and social sciences. However, most procedures for causal discovery only output a single estimated causal model or single equivalence class of models. We propose a procedure for quantifying uncertainty in causal discovery. Specifically, we consider linear structural equation models with non-Gaussian errors and propose a procedure which returns a confidence sets of causal orderings which are not ruled out by the data. We show that asymptotically, the true causal ordering will be contained in the returned set with some user specified probability.