2024 Winter - McGill Statistics Seminars

- Apr 12, 2024
- post
Free energy fluctuations of spherical spin glasses near the critical temperature threshold

Elizabeth Collins-Woodfin · Apr 12, 2024
Date: 2024-04-12

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

https://mcgill.zoom.us/j/86957985232

Meeting ID: 869 5798 5232

Passcode: None

Abstract:

One of the fascinating phenomena of spin glasses is the dramatic change in behavior that occurs between the high and low temperature regimes. In addition to its physical meaning, this phase transition corresponds to a detection threshold with respect to the signal-to-noise ratio in a spiked matrix model. The free energy of the spherical Sherrington-Kirkpatrick (SSK) model has Gaussian fluctuations at high temperature, but Tracy-Widom fluctuations at low temperature. A similar phenomenon holds for the bipartite SSK model, and we show that, when the temperature is within a small window around the critical temperature, the free energy fluctuations converge to an independent sum of Gaussian and Tracy-Widom random variables (joint work with Han Le). Our work follows two recent papers that proved similar results for the SSK model (by Landon and by Johnstone, Klochkov, Onatski, Pavlyshyn). From a statistical perspective, the free energy of SSK and bipartite SSK correspond to log-likelihood ratios for spiked Wigner and spiked Wishart matrices respectively. Analyzing bipartite SSK at critical temperature requires a variety of tools including classical random matrix results, contour integral techniques, and a CLT for the log-characteristic polynomial of Wishart random matrices evaluated near the spectral edge.

Read More…
- Mar 22, 2024
- post
Minimum Covariance Determinant: Spectral Embedding and Subset Size Determination

Qiang Heng · Mar 22, 2024
Date: 2024-03-22

Time: 15:30-16:30 (Montreal time)

Location: Online, retransmitted in Burnside 1104

https://mcgill.zoom.us/j/81895414756

Meeting ID: 818 9541 4756

Passcode: None

Abstract:

This paper introduces several enhancements to the minimum covariance determinant method of outlier detection and robust estimation of means and covariances. We leverage the principal component transform to achieve dimension reduction and ultimately better analyses. Our best subset selection algorithm strategically combines statistical depth and concentration steps. To ascertain the appropriate subset size and number of principal components, we introduce a bootstrap procedure that estimates the instability of the best subset algorithm. The parameter combination exhibiting minimal instability proves ideal for the purposes of outlier detection and robust estimation. Rigorous benchmarking against prominent MCD variants showcases our approach’s superior statistical performance and computational speed in high dimensions. Application to a fruit spectra data set and a cancer genomics data set illustrates our claims.

Read More…
- Mar 1, 2024
- post
Recent advances in causal inference under irregular and informative observation times for the outcome

Janie Coulombe · Mar 1, 2024
Date: 2024-03-01

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

https://mcgill.zoom.us/j/89811237909

Meeting ID: 898 1123 7909

Passcode: None

Abstract:

Electronic health records (EHR) data contain rich information about patients’ health condition, comorbidities, clinical outcomes, and drug prescriptions. They are often used to draw causal inferences and compare different treatments’ effectiveness. However, these data are not experimental. They present with special features that should be addressed or that may affect the inference. One of these features is the irregular observation of the longitudinal processes used in the inference. In longitudinal studies in which we seek the causal effect of a treatment on a repeated outcome, for instance, covariate-dependent observation of the outcome has been shown to bias standard causal estimators. In this presentation, I will review recent work and present some of the most interesting findings in this area of research. Themes will include identifiability, efficiency, and alternatives to weighting methods to address irregular observation times.

Read More…
- Feb 16, 2024
- post
Matrix completion in genetic methylation studies: LMCC, a Linear Model of Coregionalization with informative Covariates

Karim Oualkacha · Feb 16, 2024
Date: 2024-02-16

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

https://mcgill.zoom.us/j/82678428848

Meeting ID: 826 7842 8848

Passcode: None

Abstract:

DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, missing values is an issue and appropriate imputation techniques are important to avoid an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where a relatively small number of samples are processed via an expensive high-density Whole Genome Bisulfite Sequencing (WGBS) strategy and a larger number of samples are processed using more affordable low-density array-based technologies. In such cases, one can impute/complete the data matrix of the low coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this work, we propose an efficient Linear Model of Coregionalization with informative Covariates (LMCC) to predict missing values based on observed values and informative covariates. Our model assumes that at each genomics position, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across positions/sites by assuming Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also show that the proposed model is efficient when the number of columns is much greater than the number of rows in the data matrix-which is usually the case in methylation data analysis. Finally, we apply and compare the proposed method with alternative approaches to complete a matrix of DNA methylation containing 15 rows (methylation samples) and 1 million columns (sites). Joint work with Melina Ribaud and Aurelie Labbe (HEC, Montreal).

Read More…
- Feb 9, 2024
- post
Mesoscale two-sample testing for networks

Peter William MacDonald · Feb 9, 2024
Date: 2024-02-09

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

https://mcgill.zoom.us/j/87465663442

Meeting ID: 874 6566 3442

Passcode: None

Abstract:

Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications, for example, neuroimaging, to observe multiple networks on a shared node set. When these networks are grouped by case-control status or another categorical covariate, the classical statistical question of two-sample comparison arises. In this work, we address the problem of testing for statistically significant differences in a prespecified subset of the connections. This general framework allows an analyst to focus on a single node, a specific region of interest, or compare whole networks. In this “mesoscale” setting, we develop statistically sound projection-based tests for two-sample comparison in both weighted and binary edge networks. Our approach can leverage all available network information, and learn informative projections which improve testing power when low-dimensional network structure is present.

Read More…
- Feb 2, 2024
- post
Fast calibration of FARIMA models with dependent errors

Youssef Esstafa · Feb 2, 2024
Date: 2024-02-02

Time: 15:30-16:30 (Montreal time)

Location: Online, retransmitted in Burnside 1104

https://mcgill.zoom.us/j/89669635642

Meeting ID: 896 6963 5642

Passcode: None

Abstract:

In this work, we investigate the asymptotic properties of Le Cam’s one-step estimator for weak Fractionally AutoRegressive Integrated Moving-Average (FARIMA) models. For these models, noises are uncorrelated but neither necessarily independent nor martingale differences errors. We show under some regularity assumptions that the one-step estimator is strongly consistent and asymptotically normal with the same asymptotic variance as the least squares estimator. We show through simulations that the proposed estimator reduces computational time compared with the least squares estimator.

Read More…
- Jan 19, 2024
- post
Imaging and Clinical Biomarker Estimation in Alzheimer’s Disease

Ani Eloyan · Jan 19, 2024
Date: 2024-01-19

Time: 15:30-16:30 (Montreal time)

Location: Online, retransmitted in Burnside 1104

https://mcgill.zoom.us/j/85422946487

Meeting ID: 854 2294 6487

Passcode: None

Abstract:

Estimation of biomarkers related to disease classification and modeling of its progression is essential for treatment development for Alzheimer’s Disease (AD). The task is more daunting for characterizing relatively rare AD subtypes such as the early-onset AD. In this talk, I will describe the Longitudinal Alzheimer’s Disease Study (LEADS) intending to collect and publicly distribute clinical, imaging, genetic, and other types of data from people with EOAD, as well as cognitively normal (CN) controls and people with early-onset non-amyloid positive (EOnonAD) dementias. I will discuss manifold estimation methods for estimation of surfaces of shapes in the brain using data clouds, longitudinal manifold learning methods for modeling trajectories of shape changes in the brain over time. Finally, I will discuss our work in leveraging magnetic resonance imaging and positron emission tomography data to characterize distributions of white matter hyperintensities in people with EOAD and to obtain imaging-based biomarkers of disease trajectories of AD subtypes.

Read More…
- Jan 12, 2024
- post
New Advances in High-Dimensional DNA Methylation Analysis in Cancer Epigenetic Using Trans-dimensional Hidden Markov Models

Farhad Shokoohi · Jan 12, 2024
Date: 2024-01-12

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

https://mcgill.zoom.us/j/83008174313

Meeting ID: 830 0817 4313

Passcode: None

Abstract:

Epigenetic alterations are key drivers in the development and progression of cancer. Identifying differentially methylated cytosines (DMCs) in cancer samples is a crucial step toward understanding these changes. In this talk, we propose a trans-dimensional Markov chain Monte Carlo (TMCMC) approach that uses hidden Markov models (HMMs) with binomial emission, and bisulfite sequencing (BS-Seq) data, called DMCTHM, to identify DMCs in cancer epigenetic studies. We introduce the Expander-Collider penalty to tackle under and over-estimation in TMCMC-HMMs. We address all known challenges inherent in BS-Seq data by introducing novel approaches for capturing functional patterns and autocorrelation structure of the data, as well as for handling missing values, multiple covariates, multiple comparisons, and family-wise errors. We demonstrate the effectiveness of DMCTHM through comprehensive simulation studies. The results show that our proposed method outperforms other competing methods in identifying DMCs. Notably, with DMCTHM, we uncovered new DMCs and genes in Colorectal cancer that were significantly enriched in the Tp53 pathway.

Read More…

Date: 2024-04-12

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

Meeting ID: 869 5798 5232

Passcode: None

Abstract:

Date: 2024-03-22

Time: 15:30-16:30 (Montreal time)

Location: Online, retransmitted in Burnside 1104

Meeting ID: 818 9541 4756

Passcode: None

Abstract:

Date: 2024-03-01

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

Meeting ID: 898 1123 7909

Passcode: None

Abstract:

Date: 2024-02-16

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

Meeting ID: 826 7842 8848

Passcode: None

Abstract:

Date: 2024-02-09

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

Meeting ID: 874 6566 3442

Passcode: None

Abstract:

Date: 2024-02-02

Time: 15:30-16:30 (Montreal time)

Location: Online, retransmitted in Burnside 1104

Meeting ID: 896 6963 5642

Passcode: None

Abstract:

Date: 2024-01-19

Time: 15:30-16:30 (Montreal time)

Location: Online, retransmitted in Burnside 1104

Meeting ID: 854 2294 6487

Passcode: None

Abstract:

Date: 2024-01-12

Time: 15:30-16:30 (Montreal time)

Location: In person, Burnside 1104

Meeting ID: 830 0817 4313

Passcode: None

Abstract: