/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Imaging and Clinical Biomarker Estimation in Alzheimer’s Disease

    Date: 2024-01-19

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104


    Meeting ID: 854 2294 6487

    Passcode: None


    Estimation of biomarkers related to disease classification and modeling of its progression is essential for treatment development for Alzheimer’s Disease (AD). The task is more daunting for characterizing relatively rare AD subtypes such as the early-onset AD. In this talk, I will describe the Longitudinal Alzheimer’s Disease Study (LEADS) intending to collect and publicly distribute clinical, imaging, genetic, and other types of data from people with EOAD, as well as cognitively normal (CN) controls and people with early-onset non-amyloid positive (EOnonAD) dementias. I will discuss manifold estimation methods for estimation of surfaces of shapes in the brain using data clouds, longitudinal manifold learning methods for modeling trajectories of shape changes in the brain over time. Finally, I will discuss our work in leveraging magnetic resonance imaging and positron emission tomography data to characterize distributions of white matter hyperintensities in people with EOAD and to obtain imaging-based biomarkers of disease trajectories of AD subtypes.

  • New Advances in High-Dimensional DNA Methylation Analysis in Cancer Epigenetic Using Trans-dimensional Hidden Markov Models

    Date: 2024-01-12

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104


    Meeting ID: 830 0817 4313

    Passcode: None


    Epigenetic alterations are key drivers in the development and progression of cancer. Identifying differentially methylated cytosines (DMCs) in cancer samples is a crucial step toward understanding these changes. In this talk, we propose a trans-dimensional Markov chain Monte Carlo (TMCMC) approach that uses hidden Markov models (HMMs) with binomial emission, and bisulfite sequencing (BS-Seq) data, called DMCTHM, to identify DMCs in cancer epigenetic studies. We introduce the Expander-Collider penalty to tackle under and over-estimation in TMCMC-HMMs. We address all known challenges inherent in BS-Seq data by introducing novel approaches for capturing functional patterns and autocorrelation structure of the data, as well as for handling missing values, multiple covariates, multiple comparisons, and family-wise errors. We demonstrate the effectiveness of DMCTHM through comprehensive simulation studies. The results show that our proposed method outperforms other competing methods in identifying DMCs. Notably, with DMCTHM, we uncovered new DMCs and genes in Colorectal cancer that were significantly enriched in the Tp53 pathway.

  • Robust and Tuning-Free Sparse Linear Regression via Square-Root Slope

    Date: 2023-11-17

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104


    Meeting ID: 818 6563 0475

    Passcode: None


    We consider the high-dimensional linear regression model and assume that a fraction of the responses are contaminated by an adversary with complete knowledge of the data and the underlying distribution. We are interested in the situation when the dense additive noise can be heavy-tailed but the predictors have sub-Gaussian distribution. We establish minimax lower bounds that depend on the fraction of the contaminated data and the tails of the additive noise. Moreover, we design a modification of the square root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, with the performance guarantees that take the form of sub-Gaussian deviation inequalities and match the lower error bounds up to log-factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze the performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that could be of independent interest. This is joint work with Stanislav Minsker and Lang Wang.

  • Copula-based estimation of health inequality measures

    Date: 2023-11-10

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104


    Meeting ID: 893 3779 3218

    Passcode: None


    This paper aims to use copulas to derive estimators of the health concentration curve and Gini coefficient for health distribution. We highlight the importance of expressing health inequality measures in terms of a copula, which we in turn use to build copula-based semi and nonparametric estimators of the above measures. Thereafter, we study the asymptotic properties of these estimators. In particular, we establish their consistency and asymptotic normality. We provide expressions for their variances, which can be used to construct confidence intervals and build tests for the health concentration curve and Gini health coefficient. A Monte-Carlo simulation exercise shows that the semiparametric estimator outperforms the smoothed nonparametric estimator, and the latter does better than the empirical estimator in terms of Mean Squared Error. We also run an extensive empirical study where we apply our estimators to show that the inequalities across U.S. states’s socioeconomic variables like income/poverty and race/ethnicity explain the observed inequalities in COVID-19 infections and deaths in the U.S.

  • Reduced-Rank Envelope Vector Autoregressive Models

    Date: 2023-11-03

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104


    Meeting ID: 257 102 3554

    Passcode: None


    Classical vector autoregressive (VAR) models have long been a popular choice for modeling multivariate time series data due to their flexibility and ease of use. However, the VAR model suffers from overparameterization which is a serious issue for high-dimensional time series data as it restricts the number of variables and lags that can be incorporated into the model. Several statistical methods have been proposed to achieve dimension reduction in the parameter space of VAR models. Yet, these methods prove inefficient in extracting relevant information from complex datasets, as they fail to distinguish between information aligned with scientific objectives and are also inefficient in addressing rank deficiency problems. Envelope methods, founded on novel parameterizations that employ reduced subspaces to establish connections between the mean function and covariance matrix, offer a solution by efficiently identifying and eliminating irrelevant information. In this presentation, we introduce a new, parsimonious VAR model that incorporates the concept of envelope models into the reduced-rank VAR framework that can achieve substantial dimension reduction and efficient parameter estimation. We will present the results of simulation studies and real data analysis comparing the performance of our proposed model with that of existing models in the literature.

  • Doubly Robust Estimation under Covariate-induced Dependent Left Truncation

    Date: 2023-10-27

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104


    Meeting ID: 841 9549 8572

    Passcode: None


    In prevalent cohort studies with follow-up, the time-to-event outcome is subject to left truncation leading to selection bias. For estimation of the distribution of time-to-event, conventional methods adjusting for left truncation tend to rely on the (quasi-)independence assumption that the truncation time and the event time are “independent" on the observed region. This assumption is violated when there is dependence between the truncation time and the event time possibly induced by measured covariates. Inverse probability of truncation weighting leveraging covariate information can be used in this case, but it is sensitive to misspecification of the truncation model. In this work, we apply the semiparametric theory to find the efficient influence curve of an expected (arbitrarily transformed) survival time in the presence of covariate-induced dependent left truncation. We then use it to construct estimators that are shown to enjoy double-robustness properties. Our work represents the first attempt to construct doubly robust estimators in the presence of left truncation, which does not fall under the established framework of coarsened data where doubly robust approaches are developed. We provide technical conditions for the asymptotic properties that appear to not have been carefully examined in the literature for time-to-event data, and study the estimators via extensive simulation. We apply the estimators to two data sets from practice, with different right-censoring patterns.

  • Neural network architectures for functional data analysis

    Date: 2023-10-20

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104


    Meeting ID: 897 6116 5882

    Passcode: None


    Functional data is defined as any random variables that assume values in an infinite precision domain, such as time or space. In applications, this data is usually discretely observed at some regularly or irregularly-spaced points over the domain. In this talk, we discuss ways to adapt modern neural network architectures for the analysis of functional data. To do so, we design new neural network layers in order to process functional data either as input, output or both. First, we propose the functional output layer, which can be used to solve a multitude of function-on-scalar regression problems in a non-linear way. The proposed layer provides a smooth representation of the output and we demonstrate how to regularize such a layer during the network training phase. Second, we propose a concept for functional weights that project functional data to a scalar representation, leading to a novel formulation for a functional input layer. We demonstrate how to combine both of these proposed functional layers to create a functional autoencoder. This model takes as input the data in the form it is usually collected, as discrete points over the domain, and can be used for feature extraction and functional data smoothing. We demonstrate the benefits of the proposed architectures with various experiments on simulated data and real data applications. We conclude with a brief discussion of ongoing work in the design of a functional convolution layer that bridges the gap between the discrete convolution operation and its continuous counterpart.

  • Distances on and between complex networks

    Date: 2023-10-13

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104


    Meeting ID: 834 7786 5796

    Passcode: None


    Distance plays a pivotal role in statistics. Meanwhile, recent technologies and social networks have yielded large complex network data sets, which require customized statistical tools. From a mathematical viewpoint, these complex networks are graphs with non-trivial structures (in contrast to Erdös-Rényi graphs, for example). These networks are models of systemic phenomena and cases where individual-level analyses are insufficient. Such models are not only used in the study of social networks, but are also widely employed in neurology, biology, telecommunication and finance, among many areas of application. Unfortunately, however, distances on graphs are not clearly defined.

  • Doubly robust inference under possibly misspecified marginal structural Cox model

    Date: 2023-09-29

    Time: 15:30-16:30 (Montreal time)

    Location: Online, retransmitted in Burnside 1104


    Meeting ID: 824 4080 7026

    Passcode: None


    Doubly robust estimation under the marginal structural Cox model has been a challenge until recently due to the non-collapsibility of the Cox regression model. This is because the estimand of causal hazard ratio assumes that the marginal structural Cox model holds, while the doubly robust estimating function requires the specification of an additional model for the conditional distribution of the time-to-event given treatment and covariates, both models unlikely to hold simultaneously. It became possible recently to resolve this issue with the understanding of rate double robustness and machine learning or nonparametric approaches, although technical details are still to be spelt out to ensure root-n inference for the estimand. We describe our work considering both observational studies setting and in the presence of covariate-induced informative censoring. An added benefit of our approach is the interpretation of the estimand when the assumed marginal structural Cox model does not hold, as a time-averaged treatment effect. This allows meaningful estimation of treatment effects for general two-group comparison without the Cox model, or under alternative models such as the semiparametric proportional odds or transformation models for the potential time-to-event outcomes.

  • Detection of Multiple Influential Observations on Variable Selection for High-dimensional Data: New Perspective with an Application to Neurologic Signature of Physical Pain.

    Date: 2023-09-22

    Time: 15:30-16:30 (Montreal time)

    Location: In person, Burnside 1104


    Meeting ID: 893 7481 3252

    Passcode: None


    Influential diagnosis is an integral part of data analysis, of which most existing methodological frameworks presume a deterministic submodel and are designed for low-dimensional data (i.e., the number of predictors $p$ smaller than the sample size $n$). However, the stochastic selection of a submodel from high-dimensional data where $p$ exceeds $n$ has become ubiquitous. Thus, methods for identifying observations that could exert undue influence on the choice of a submodel can play an important role in this setting. To date, discussion of this topic has been limited, falling short in two domains: (1) constrained ability to detect multiple influential points, and (2) applicability only in restrictive settings. In this talk, building on a recently proposed measure, we introduce a generalized version accommodating different model selectors, the asymptotic property of which is subsequently examined for large $p$. The $K$-means clustering is incorporated into our scheme to detect multiple influential points. Simulation is then conducted to assess the performances of various diagnostic approaches. The proposed procedure further demonstrates its value in improving predictive power when analyzing thermal-stimulated pain based on fMRI data. In addition, the latest development revolving around this newly proposed measure is also presented. This work is conducted under the joint supervision of Professors Masoud Asgharian and Martin Lindquist.