Past Seminar Series - McGill Statistics Seminars

- Oct 7, 2022
- post
Some steps towards causal representation learning

Jason Hartford · Oct 7, 2022
Date: 2022-10-07

Time: 15:30-16:30 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

High-dimensional unstructured data such images or sensor data can often be collected cheaply in experiments, but is challenging to use in a causal inference pipeline without extensive engineering and domain knowledge to extract underlying latent factors. The long term goal of causal representation learning is to find appropriate assumptions and methods to disentangle latent variables and learn the causal mechanisms that explain a system’s behaviour. In this talk, I’ll present results from a series of recent papers that describe how we can leverage assumptions about a system’s causal mechanisms to provably disentangle latent factors. I will also talk about the limitations of a commonly used injectivity assumption, and discuss a hierarchy of settings that relax this assumption.

Read More…
- Sep 30, 2022
- post
Full likelihood inference for abundance from capture-recapture data: semiparametric efficiency and EM-algorithm

Pengfei Li · Sep 30, 2022
Date: 2022-09-30

Time: 15:30-16:30 (Montreal time)

HTTPS://US06WEB.ZOOM.US/J/84226701306?PWD=UEZ5NVPZAULLDW5QNU8VZZIVBEJXQT09

MEETING ID: 842 2670 1306

PASSCODE: 692788

Abstract:

Capture-recapture experiments are widely used to collect data needed to estimate the abundance of a closed population. To account for heterogeneity in the capture probabilities, Huggins (1989) and Alho (1990) proposed a semiparametric model in which the capture probabilities are modelled parametrically and the distribution of individual characteristics is left unspecified. A conditional likelihood method was then proposed to obtain point estimates and Wald-type confidence intervals for the abundance. Empirical studies show that the small-sample distribution of the maximum conditional likelihood estimator is strongly skewed to the right, which may produce Wald-type confidence intervals with lower limits that are less than the number of captured individuals or even negative.

Read More…
- Sep 16, 2022
- post
Statistical Inference for Functional Linear Quantile Regression

Peijun Sang · Sep 16, 2022
Date: 2022-09-16

Time: 15:20-16:20 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

We propose inferential tools for functional linear quantile regression where the conditional quantile of a scalar response is assumed to be a linear functional of a functional covariate. In contrast to conventional approaches, we employ kernel convolution to smooth the original loss function. The coefficient function is estimated under a reproducing kernel Hilbert space framework. A gradient descent algorithm is designed to minimize the smoothed loss function with a roughness penalty. With the aid of the Banach fixed-point theorem, we show the existence and uniqueness of our proposed estimator as the minimizer of the regularized loss function in an appropriate Hilbert space. Furthermore, we establish the convergence rate as well as the weak convergence of our estimator. As far as we know, this is the first weak convergence result for a functional quantile regression model. Pointwise confidence intervals and a simultaneous confidence band for the true coefficient function are then developed based on these theoretical properties. Numerical studies including both simulations and a data application are conducted to investigate the performance of our estimator and inference tools in finite sample.

Read More…
- Sep 9, 2022
- post
Markov-Switching State Space Models For Uncovering Musical Interpretation

Daniel McDonald · Sep 9, 2022
Date: 2022-09-09

Time: 15:30-16:30 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

For concertgoers, musical interpretation is the most important factor in determining whether or not we enjoy a classical performance. Every performance includes mistakes—intonation issues, a lost note, an unpleasant sound—but these are all easily forgotten (or unnoticed) when a performer engages her audience, imbuing a piece with novel emotional content beyond the vague instructions inscribed on the printed page. In this research, we use data from the CHARM Mazurka Project—forty-six professional recordings of Chopin’s Mazurka Op. 68 No. 3 by consummate artists—with the goal of elucidating musically interpretable performance decisions. We focus specifically on each performer’s use of musical tempo by examining the inter-onset intervals of the note attacks in the recording. To explain these tempo decisions, we develop a switching state space model and estimate it by maximum likelihood combined with prior information gained from music theory and performance practice. We use the estimated parameters to quantitatively describe individual performance decisions and compare recordings. These comparisons suggest methods for informing music instruction, discovering listening preferences, and analyzing performances.

Read More…
- Apr 8, 2022
- post
Enriched post-selection models for high dimensional data

Reza Drikvandi · Apr 8, 2022
Date: 2022-04-08

Time: 15:35-16:35 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

High dimensional data are rapidly growing in many domains, for example, in microarray gene expression studies, fMRI data analysis, large-scale healthcare analytics, text/image analysis, natural language processing and astronomy, to name but a few. In the last two decades regularisation approaches have become the methods of choice for analysing high dimensional data. However, obtaining accurate estimates and predictions as well as valid statistical inference remains a major challenge in high dimensional situations. In this talk, we present enriched post-selection models that aim to improve parameter estimation and prediction, and to facilitate statistical inferences in high dimensional regression models. The enriched post-selection method enables us to construct valid post-selection inference for regression parameters in high dimensions. We discuss the empirical and asymptotic properties of the enriched post-selection method.

Read More…
- Apr 1, 2022
- post
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Stephen Bates · Apr 1, 2022
Date: 2022-04-01

Time: 15:35-16:35 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

We introduce Learn then Test, a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees regardless of the underlying model and (unknown) data-generating distribution. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. To accomplish this, we solve a key technical challenge: the control of arbitrary risks that are not necessarily monotonic. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.

Read More…
- Mar 25, 2022
- post
Distribution-free inference for regression: discrete, continuous, and in between

Rina Foygel Barber · Mar 25, 2022
Date: 2022-03-25

Time: 15:35-16:35 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional mean E[Y|X]) is more challenging. If X takes only a small number of possible values, then inference on E[Y|X] is trivial to achieve. At the other extreme, if the features X are continuously distributed, we show that any confidence interval for E[Y|X] must have non-vanishing width, even as sample size tends to infinity - this is true regardless of smoothness properties or other desirable features of the underlying distribution. In between these two extremes, we find several distinct regimes - in particular, it is possible for distribution-free confidence intervals to have vanishing width if and only if the effective support size of the distribution ofXis smaller than the square of the sample size.

Read More…
- Mar 11, 2022
- post
New Approaches for Inference on Optimal Treatment Regimes

Lan Wang · Mar 11, 2022
Date: 2022-03-11

Time: 15:30-16:30 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Finding the optimal treatment regime (or a series of sequential treatment regimes) based on individual characteristics has important applications in precision medicine. We propose two new approaches to quantify uncertainty in optimal treatment regime estimation. First, we consider inference in the model-free setting, which does not require specifying an outcome regression model. Existing model-free estimators for optimal treatment regimes are usually not suitable for the purpose of inference, because they either have nonstandard asymptotic distributions or do not necessarily guarantee consistent estimation of the parameter indexing the Bayes rule due to the use of surrogate loss. We study a smoothed robust estimator that directly targets the parameter corresponding to the Bayes decision rule for optimal treatment regimes estimation. We verify that a resampling procedure provides asymptotically accurate inference for both the parameter indexing the optimal treatment regime and the optimal value function. Next, we consider the high-dimensional setting and propose a semiparametric model-assisted approach for simultaneous inference. Simulation results and real data examples are used for illustration.

Read More…
- Feb 18, 2022
- post
Structure learning for extremal graphical models

Stanislav Volgushev · Feb 18, 2022
Date: 2022-02-18

Time: 15:30-16:30 (Montreal time)

https://umontreal.zoom.us/j/85105423917?pwd=enM3MGpFNkZKU2daMjRITmo0N0JUUT09

Meeting ID: 851 0542 3917

Passcode: 403790

Abstract:

Extremal graphical models are sparse statistical models for multivariate extreme events. The underlying graph encodes conditional independencies and enables a visual interpretation of the complex extremal dependence structure. For the important case of tree models, we provide a data-driven methodology for learning the graphical structure. We show that sample versions of the extremal correlation and a new summary statistic, which we call the extremal variogram, can be used as weights for a minimum spanning tree to consistently recover the true underlying tree. Remarkably, this implies that extremal tree models can be learned in a completely non-parametric fashion by using simple summary statistics and without the need to assume discrete distributions, existence of densities, or parametric models for marginal or bivariate distributions. Extensions to more general graphs are also discussed.

Read More…
- Feb 11, 2022
- post
Integration of multi-omics data for the discovery of novel regulators that modulate biological processes

Jun Ding · Feb 11, 2022
Date: 2022-02-11

Time: 15:30-16:30 (Montreal time)

https://mcgill.zoom.us/j/83436686293?pwd=b0RmWmlXRXE3OWR6NlNIcWF5d0dJQT09

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

The cellular states in various biological processes such as cell differentiation, disease progression, and treatment response are often enormously complex and thus hard to be profiled with unimodal profiling (e.g., transcriptome). Although those unimodal measurements had brought success for studies in a large variety of studies, the incomplete (and often misleading) unimodal cellular profiling could lead to
biased and inaccurate conclusions. With the development of biotechnologies, the availability of multi-omics data (bulk or single-cell) is ever-increasing. The rapid-accumulating multi-omics data offers unprecedented opportunities to accurately decode the cellular states in biological process and thus could derive a deep understanding of the change of the cellular states, crucial for finding biomarkers and therapeutic intervention strategies. In this talk, we will discuss a few multimodal methods that we developed to integrate multi-omics data for the discovery of novel regulators for multiple biological processes. Many of the novel predictions from the multimodal methods were experimentally validated and had brought new understandings of the underlying mechanisms for several diseases. I will also discuss how a potential novel COVID19 drug is discovered from such a multi-omics data integration analysis.

Read More…

Date: 2022-10-07

Time: 15:30-16:30 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-09-30

Time: 15:30-16:30 (Montreal time)

MEETING ID: 842 2670 1306

PASSCODE: 692788

Abstract:

Date: 2022-09-16

Time: 15:20-16:20 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-09-09

Time: 15:30-16:30 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-04-08

Time: 15:35-16:35 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-04-01

Time: 15:35-16:35 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-03-25

Time: 15:35-16:35 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-03-11

Time: 15:30-16:30 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract:

Date: 2022-02-18

Time: 15:30-16:30 (Montreal time)

Meeting ID: 851 0542 3917

Passcode: 403790

Abstract:

Date: 2022-02-11

Time: 15:30-16:30 (Montreal time)

Meeting ID: 834 3668 6293

Passcode: 12345

Abstract: