/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Distances on and between complex networks

    Date: 2023-10-13

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/83477865796

    Meeting ID: 834 7786 5796

    Passcode: None

    Abstract:

    Distance plays a pivotal role in statistics. Meanwhile, recent technologies and social networks have yielded large complex network data sets, which require customized statistical tools. From a mathematical viewpoint, these complex networks are graphs with non-trivial structures (in contrast to Erdös-Rényi graphs, for example). These networks are models of systemic phenomena and cases where individual-level analyses are insufficient. Such models are not only used in the study of social networks, but are also widely employed in neurology, biology, telecommunication and finance, among many areas of application. Unfortunately, however, distances on graphs are not clearly defined.

  • Doubly robust inference under possibly misspecified marginal structural Cox model

    Date: 2023-09-29

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104

    https://mcgill.zoom.us/j/82440807026

    Meeting ID: 824 4080 7026

    Passcode: None

    Abstract:

    Doubly robust estimation under the marginal structural Cox model has been a challenge until recently due to the non-collapsibility of the Cox regression model. This is because the estimand of causal hazard ratio assumes that the marginal structural Cox model holds, while the doubly robust estimating function requires the specification of an additional model for the conditional distribution of the time-to-event given treatment and covariates, both models unlikely to hold simultaneously. It became possible recently to resolve this issue with the understanding of rate double robustness and machine learning or nonparametric approaches, although technical details are still to be spelt out to ensure root-n inference for the estimand. We describe our work considering both observational studies setting and in the presence of covariate-induced informative censoring. An added benefit of our approach is the interpretation of the estimand when the assumed marginal structural Cox model does not hold, as a time-averaged treatment effect. This allows meaningful estimation of treatment effects for general two-group comparison without the Cox model, or under alternative models such as the semiparametric proportional odds or transformation models for the potential time-to-event outcomes.

  • Detection of Multiple Influential Observations on Variable Selection for High-dimensional Data: New Perspective with an Application to Neurologic Signature of Physical Pain.

    Date: 2023-09-22

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104

    https://mcgill.zoom.us/j/89374813252

    Meeting ID: 893 7481 3252

    Passcode: None

    Abstract:

    Influential diagnosis is an integral part of data analysis, of which most existing methodological frameworks presume a deterministic submodel and are designed for low-dimensional data (i.e., the number of predictors $p$ smaller than the sample size $n$). However, the stochastic selection of a submodel from high-dimensional data where $p$ exceeds $n$ has become ubiquitous. Thus, methods for identifying observations that could exert undue influence on the choice of a submodel can play an important role in this setting. To date, discussion of this topic has been limited, falling short in two domains: (1) constrained ability to detect multiple influential points, and (2) applicability only in restrictive settings. In this talk, building on a recently proposed measure, we introduce a generalized version accommodating different model selectors, the asymptotic property of which is subsequently examined for large $p$. The $K$-means clustering is incorporated into our scheme to detect multiple influential points. Simulation is then conducted to assess the performances of various diagnostic approaches. The proposed procedure further demonstrates its value in improving predictive power when analyzing thermal-stimulated pain based on fMRI data. In addition, the latest development revolving around this newly proposed measure is also presented. This work is conducted under the joint supervision of Professors Masoud Asgharian and Martin Lindquist.

  • Three Myths About Causal Mediation

    Date: 2023-09-15

    Time: 15:30-16:30 (Montreal time)

    Location: Burnside 1104

    https://mcgill.zoom.us/j/86404798712

    Meeting ID: 864 0479 8712

    Passcode: None

    Abstract:

    Causal mediation techniques are a means for identifying the degree to which a cause influences its effect along particular causal paths. For example, in a model where a cause influences its effect both indirectly via a mediator and directly via factors not included in the model, mediation techniques enable one to measure both direct and indirect effects. Although mediation techniques are widely employed, they are often misunderstood. This is in part due to the long-term influence of Baron and Kenny’s (1986) treatment of mediation, which applies only to linear models without interaction, and which leads one to develop intuitions about direct and indirect effects that do not generalize to non-parametric causal models. In my talk, I identify and reject three persistent myths about mediation. I argue that such methods: 1. Should not be understood as decomposing the total effect into additive components corresponding to the contributions of the paths; 2. Are not a means for eliminating latent heterogeneity; and 3. Do not require one to appeal to causal concepts other than the counterfactual causal ones built into structural causal models. These points are crucial for understanding mediation effects in any contexts in which they are studied, and have particular applications for studies of fairness and discrimination, in which such effects play an increasingly central role (Plečko and Bareinboim, 2022).

  • Empirical Bayes Control of the False Discovery Exceedance

    Date: 2023-08-17

    Time: 15:30-16:30 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/89623344755?pwd=S1E0QWVjSm8wRHdIYU5IZzllSXNjUT09

    Meeting ID: 896 2334 4755

    Passcode: 287381

    Abstract:

    In sparse large-scale testing problems where the false discovery proportion (FDP) is highly variable, the false discovery exceedance (FDX) provides a valuable alternative to the widely used false discovery rate (FDR). We develop an empirical Bayes approach to controlling the FDX. We show that for independent hypotheses from a two-group model and dependent hypotheses from a Gaussian model fulfilling the exchangeability condition, an oracle decision rule based on ranking and thresholding the local false discovery rate (lfdr) is optimal in the sense that the power is maximized subject to FDX constraint. We propose a data-driven FDX procedure that emulates the oracle via carefully designed computational shortcuts. We investigate the empirical performance of the proposed method using simulations and illustrate the merits of FDX control through an application for identifying abnormal stock trading strategies.

  • Residual-based estimation of parametric copulas under regression

    Date: 2023-08-14

    Time: 15:30-16:30 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    We study a multivariate response regression model where each coordinate is described by a location-scale regression, and where the dependence structure of the “noise” terms in the regression is described by a parametric copula. Our goal is to estimate the associated Euclidean copula parameter given a sample of the response and the covariate. In the absence of the copula sample, the oracle ranks in the usual pseudo-likelihood estimation procedure are no longer computable. Instead, we base our estimation on the residual ranks calculated from some preliminary estimators of the regression functions. We show that the residual-based estimators are asymptotically equivalent to their oracle counterparts, even when the dimension of the covariate in the regression is moderately diverging. Partially to serve this objective, we also study the weighted convergence of the residual empirical processes.

  • Confidence sets for Causal Discovery

    Date: 2023-03-24

    Time: 15:30-16:30 (Montreal time)

    On Zoom only

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    Causal discovery procedures are popular methods for discovering causal structure across the physical, biological, and social sciences. However, most procedures for causal discovery only output a single estimated causal model or single equivalence class of models. We propose a procedure for quantifying uncertainty in causal discovery. Specifically, we consider linear structural equation models with non-Gaussian errors and propose a procedure which returns a confidence sets of causal orderings which are not ruled out by the data. We show that asymptotically, the true causal ordering will be contained in the returned set with some user specified probability.

  • Excursions in Statistical History: Highlights

    Date: 2023-03-17

    Time: 15:30-16:30 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    Over the last 20 years, the speaker has delved into the origins of ‘regression’; the development of the ’t’ and ‘Poisson’ distributions; forerunners of the ‘hazard’ function; and the statistical design and conduct of US Selective Service lotteries from 1917 onwards. This talk will recount the stories, data and simulations behind some of these, and provide some modern-day re-enactments.

  • Heteroskedastic Sparse PCA in High Dimensions

    Date: 2023-03-10

    Time: 15:30-16:30 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    Principal component analysis (PCA) is one of the most commonly used techniques for dimension reduction and feature extraction. Though it has been well-studied for high-dimensional sparse PCA, little is known when the noise is heteroskedastic, which turns out to be ubiquitous in many scenarios, like biological sequencing data and information network data. We propose an iterative algorithm for sparse PCA in the presence of heteroskedastic noise, which alternatively updates the estimates of the sparse eigenvectors using the power method with adaptive thresholding in one step, and imputes the diagonal values of the sample covariance matrix to reduce the estimation bias due to heteroskedasticity in the other step. Our procedure is computationally fast and provably optimal under the generalized spiked covariance model, assuming the leading eigenvectors are sparse. A comprehensive simulation study demonstrates its robustness and effectiveness in various settings.

  • High Dimensional Logistic Regression Under Network Dependence

    Date: 2023-03-10

    Time: 14:15-15:15 (Montreal time)

    Hybrid: In person / Zoom

    Location: Burnside Hall 1104

    https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

    Meeting ID: 834 3668 6293

    Passcode: 12345

    Abstract:

    The classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure, such as over a temporal/spatial domain or on a social network. This necessitates the development of models that can simultaneously handle both the network peer-effect (arising from neighborhood interactions) and the effect of (possibly) high-dimensional covariates. In this talk, I will describe a framework for incorporating such dependencies in a high-dimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture the pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the node-dependent external fields linearly encode the high-dimensional covariates. We use a penalized maximum pseudo-likelihood method for estimating the network peer-effect and the effect of the covariates (the regression coefficients), which, in addition to handling the high-dimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical (independent) logistic regression, when the true parameter is sparse and the underlying network is not too dense. Towards the end, I will talk about the rates of consistency of our proposed estimator for various natural graph ensembles, such as bounded degree graphs, sparse Erdos-Renyi random graphs, and stochastic block models, which follow as a consequence of our general results. This is a joint work with Ziang Niu, Sagnik Halder, Bhaswar Bhattacharya and George Michailidis.