/tags/2023-fall/index.xml 2023 Fall - McGill Statistics Seminars
  • Robust and Tuning-Free Sparse Linear Regression via Square-Root Slope

    Date: 2023-11-17

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/81865630475

    Meeting ID: 818 6563 0475

    Passcode: None

    Abstract:

    We consider the high-dimensional linear regression model and assume that a fraction of the responses are contaminated by an adversary with complete knowledge of the data and the underlying distribution. We are interested in the situation when the dense additive noise can be heavy-tailed but the predictors have sub-Gaussian distribution. We establish minimax lower bounds that depend on the fraction of the contaminated data and the tails of the additive noise. Moreover, we design a modification of the square root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, with the performance guarantees that take the form of sub-Gaussian deviation inequalities and match the lower error bounds up to log-factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze the performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that could be of independent interest. This is joint work with Stanislav Minsker and Lang Wang.

  • Copula-based estimation of health inequality measures

    Date: 2023-11-10

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89337793218

    Meeting ID: 893 3779 3218

    Passcode: None

    Abstract:

    This paper aims to use copulas to derive estimators of the health concentration curve and Gini coefficient for health distribution. We highlight the importance of expressing health inequality measures in terms of a copula, which we in turn use to build copula-based semi and nonparametric estimators of the above measures. Thereafter, we study the asymptotic properties of these estimators. In particular, we establish their consistency and asymptotic normality. We provide expressions for their variances, which can be used to construct confidence intervals and build tests for the health concentration curve and Gini health coefficient. A Monte-Carlo simulation exercise shows that the semiparametric estimator outperforms the smoothed nonparametric estimator, and the latter does better than the empirical estimator in terms of Mean Squared Error. We also run an extensive empirical study where we apply our estimators to show that the inequalities across U.S. states’s socioeconomic variables like income/poverty and race/ethnicity explain the observed inequalities in COVID-19 infections and deaths in the U.S.

  • Reduced-Rank Envelope Vector Autoregressive Models

    Date: 2023-11-03

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/2571023554

    Meeting ID: 257 102 3554

    Passcode: None

    Abstract:

    Classical vector autoregressive (VAR) models have long been a popular choice for modeling multivariate time series data due to their flexibility and ease of use. However, the VAR model suffers from overparameterization which is a serious issue for high-dimensional time series data as it restricts the number of variables and lags that can be incorporated into the model. Several statistical methods have been proposed to achieve dimension reduction in the parameter space of VAR models. Yet, these methods prove inefficient in extracting relevant information from complex datasets, as they fail to distinguish between information aligned with scientific objectives and are also inefficient in addressing rank deficiency problems. Envelope methods, founded on novel parameterizations that employ reduced subspaces to establish connections between the mean function and covariance matrix, offer a solution by efficiently identifying and eliminating irrelevant information. In this presentation, we introduce a new, parsimonious VAR model that incorporates the concept of envelope models into the reduced-rank VAR framework that can achieve substantial dimension reduction and efficient parameter estimation. We will present the results of simulation studies and real data analysis comparing the performance of our proposed model with that of existing models in the literature.

  • Doubly Robust Estimation under Covariate-induced Dependent Left Truncation

    Date: 2023-10-27

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/84195498572

    Meeting ID: 841 9549 8572

    Passcode: None

    Abstract:

    In prevalent cohort studies with follow-up, the time-to-event outcome is subject to left truncation leading to selection bias. For estimation of the distribution of time-to-event, conventional methods adjusting for left truncation tend to rely on the (quasi-)independence assumption that the truncation time and the event time are “independent" on the observed region. This assumption is violated when there is dependence between the truncation time and the event time possibly induced by measured covariates. Inverse probability of truncation weighting leveraging covariate information can be used in this case, but it is sensitive to misspecification of the truncation model. In this work, we apply the semiparametric theory to find the efficient influence curve of an expected (arbitrarily transformed) survival time in the presence of covariate-induced dependent left truncation. We then use it to construct estimators that are shown to enjoy double-robustness properties. Our work represents the first attempt to construct doubly robust estimators in the presence of left truncation, which does not fall under the established framework of coarsened data where doubly robust approaches are developed. We provide technical conditions for the asymptotic properties that appear to not have been carefully examined in the literature for time-to-event data, and study the estimators via extensive simulation. We apply the estimators to two data sets from practice, with different right-censoring patterns.

  • Neural network architectures for functional data analysis

    Date: 2023-10-20

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89761165882

    Meeting ID: 897 6116 5882

    Passcode: None

    Abstract:

    Functional data is defined as any random variables that assume values in an infinite precision domain, such as time or space. In applications, this data is usually discretely observed at some regularly or irregularly-spaced points over the domain. In this talk, we discuss ways to adapt modern neural network architectures for the analysis of functional data. To do so, we design new neural network layers in order to process functional data either as input, output or both. First, we propose the functional output layer, which can be used to solve a multitude of function-on-scalar regression problems in a non-linear way. The proposed layer provides a smooth representation of the output and we demonstrate how to regularize such a layer during the network training phase. Second, we propose a concept for functional weights that project functional data to a scalar representation, leading to a novel formulation for a functional input layer. We demonstrate how to combine both of these proposed functional layers to create a functional autoencoder. This model takes as input the data in the form it is usually collected, as discrete points over the domain, and can be used for feature extraction and functional data smoothing. We demonstrate the benefits of the proposed architectures with various experiments on simulated data and real data applications. We conclude with a brief discussion of ongoing work in the design of a functional convolution layer that bridges the gap between the discrete convolution operation and its continuous counterpart.

  • Distances on and between complex networks

    Date: 2023-10-13

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/83477865796

    Meeting ID: 834 7786 5796

    Passcode: None

    Abstract:

    Distance plays a pivotal role in statistics. Meanwhile, recent technologies and social networks have yielded large complex network data sets, which require customized statistical tools. From a mathematical viewpoint, these complex networks are graphs with non-trivial structures (in contrast to Erdös-Rényi graphs, for example). These networks are models of systemic phenomena and cases where individual-level analyses are insufficient. Such models are not only used in the study of social networks, but are also widely employed in neurology, biology, telecommunication and finance, among many areas of application. Unfortunately, however, distances on graphs are not clearly defined.

  • Doubly robust inference under possibly misspecified marginal structural Cox model

    Date: 2023-09-29

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/82440807026

    Meeting ID: 824 4080 7026

    Passcode: None

    Abstract:

    Doubly robust estimation under the marginal structural Cox model has been a challenge until recently due to the non-collapsibility of the Cox regression model. This is because the estimand of causal hazard ratio assumes that the marginal structural Cox model holds, while the doubly robust estimating function requires the specification of an additional model for the conditional distribution of the time-to-event given treatment and covariates, both models unlikely to hold simultaneously. It became possible recently to resolve this issue with the understanding of rate double robustness and machine learning or nonparametric approaches, although technical details are still to be spelt out to ensure root-n inference for the estimand. We describe our work considering both observational studies setting and in the presence of covariate-induced informative censoring. An added benefit of our approach is the interpretation of the estimand when the assumed marginal structural Cox model does not hold, as a time-averaged treatment effect. This allows meaningful estimation of treatment effects for general two-group comparison without the Cox model, or under alternative models such as the semiparametric proportional odds or transformation models for the potential time-to-event outcomes.

  • Detection of Multiple Influential Observations on Variable Selection for High-dimensional Data: New Perspective with an Application to Neurologic Signature of Physical Pain.

    Date: 2023-09-22

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89374813252

    Meeting ID: 893 7481 3252

    Passcode: None

    Abstract:

    Influential diagnosis is an integral part of data analysis, of which most existing methodological frameworks presume a deterministic submodel and are designed for low-dimensional data (i.e., the number of predictors $p$ smaller than the sample size $n$). However, the stochastic selection of a submodel from high-dimensional data where $p$ exceeds $n$ has become ubiquitous. Thus, methods for identifying observations that could exert undue influence on the choice of a submodel can play an important role in this setting. To date, discussion of this topic has been limited, falling short in two domains: (1) constrained ability to detect multiple influential points, and (2) applicability only in restrictive settings. In this talk, building on a recently proposed measure, we introduce a generalized version accommodating different model selectors, the asymptotic property of which is subsequently examined for large $p$. The $K$-means clustering is incorporated into our scheme to detect multiple influential points. Simulation is then conducted to assess the performances of various diagnostic approaches. The proposed procedure further demonstrates its value in improving predictive power when analyzing thermal-stimulated pain based on fMRI data. In addition, the latest development revolving around this newly proposed measure is also presented. This work is conducted under the joint supervision of Professors Masoud Asgharian and Martin Lindquist.

  • Three Myths About Causal Mediation

    Date: 2023-09-15

    Time: 15:30-16:30 (Montreal time)

    Location: Burnside 1104

    https://mcgill.zoom.us/j/86404798712

    Meeting ID: 864 0479 8712

    Passcode: None

    Abstract:

    Causal mediation techniques are a means for identifying the degree to which a cause influences its effect along particular causal paths. For example, in a model where a cause influences its effect both indirectly via a mediator and directly via factors not included in the model, mediation techniques enable one to measure both direct and indirect effects. Although mediation techniques are widely employed, they are often misunderstood. This is in part due to the long-term influence of Baron and Kenny’s (1986) treatment of mediation, which applies only to linear models without interaction, and which leads one to develop intuitions about direct and indirect effects that do not generalize to non-parametric causal models. In my talk, I identify and reject three persistent myths about mediation. I argue that such methods: 1. Should not be understood as decomposing the total effect into additive components corresponding to the contributions of the paths; 2. Are not a means for eliminating latent heterogeneity; and 3. Do not require one to appeal to causal concepts other than the counterfactual causal ones built into structural causal models. These points are crucial for understanding mediation effects in any contexts in which they are studied, and have particular applications for studies of fairness and discrimination, in which such effects play an increasingly central role (Plečko and Bareinboim, 2022).