Past Seminar Series - McGill Statistics Seminars

- Jan 11, 2019
- post
Magic Cross-Validation Theory for Large-Margin Classification

Boxiang Wang · Jan 11, 2019
Date: 2019-01-11

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Cross-validation (CV) is perhaps the most widely used tool for tuning supervised machine learning algorithms in order to achieve better generalization error rate. In this paper, we focus on leave-one-out cross-validation (LOOCV) for the support vector machine (SVM) and related algorithms. We first address two wide-spreading misconceptions on LOOCV. We show that LOOCV, ten-fold, and five-fold CV are actually well-matched in estimating the generalization error, and the computation speed of LOOCV is not necessarily slower than that of ten-fold and five-fold CV. We further present a magic CV theory with a surprisingly simple recipe which allows users to very efficiently tune the SVM. We then apply the magic CV theory to demonstrate a straightforward way to prove the Bayes risk consistency of the SVM. We have implemented our algorithms in a publicly available R package magicsvm, which is much faster than the state-of-the-art SVM solvers. We demonstrate our methods on extensive simulations and benchmark examples.

Read More…
- Nov 23, 2018
- post
p-values vs Bayes factors: Is there a compromise?

David Wolfson · Nov 23, 2018
Date: 2018-11-23

Time: 15:30-16:30

Location: BURN 1104

Abstract:

This is not a research talk. Rather, the goal is to address the topic of the talk title through a 2017 multi-authored paper published in Nature Human Behaviour. The Nature article proposes that the standard cut-off significance level of .05 should be replaced by a cut-off level of .005 when new discoveries are being claimed. The authors attribute the high proportion of irreducible results in the literature that accompany claimed new discoveries, in part, to the low-bar cut-off of .05. Their fix is built around the Bayes factor. I will begin with a brief presentation of the difference between the frequentist and Bayesian approaches to statistical inference, and lead into p-values vs Bayes factors for hypothesis testing before discussing the Nature article itself. It is hoped that the talk will provoke thought about the way we do statistics.

Read More…
- Nov 16, 2018
- post
Estimation of the Median Residual Lifetime Function for Length-Biased Failure Time Data

James Hugh McVittie · Nov 16, 2018
Date: 2018-11-16

Time: 15:30-16:30

Location: BURN 1104

Abstract:

The median residual lifetime function is a statistical quantity which describes the future point in time at which the probability of current survival has dropped by 50%. In deriving an estimator for the median residual lifetime function for length-biased data, the added features of left-truncation and right-censoring must be taken into account.

In this talk, we give a brief description of length-biased failure time data and show that by using a particular non-parametric estimator for the survival function that it is possible to derive the asymptotically most-efficient non-parametric estimator for the median residual lifetime function. We give some details on the proof of the asymptotic results and examine the performance of the estimator using simulated data. We also apply the proposed estimator to the Canadian Study of Health and Aging data set to study the median residual lifetime function of patients with dementia.

Read More…
- Nov 9, 2018
- post
Density estimation of mixtures of Gaussians and Ising models

Abbas Mehrabian · Nov 9, 2018
Date: 2018-11-09

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Density estimation lies at the intersection of statistics, theoretical computer science, and machine learning. We review some old and new results on the sample complexities (also known as minimax convergence rates) of estimating densities of high-dimensional distributions, in particular mixtures of Gaussians and Ising models.

Based on joint work with Hassan Ashtiani, Shai Ben-David, Luc Devroye, Nick Harvey, Christopher Liaw, Yani Plan, and Tommy Reddad.

Read More…
- Nov 2, 2018
- post
Terrorists never congregate in even numbers (and other strange results in fragmentation-coalescence)

Andreas Kyprianou · Nov 2, 2018
Date: 2018-11-02

Time: 15:30-16:30

Location: BURN 1104

Abstract:

The rigorous mathematical treatment of random fragmentation-coalescent models in the literature is difficult to find, and perhaps for good reason. We examine two different types of random fragmentation-coalescent models which produce somewhat unexpected results.

The first concerns an agent-based model in which, with a rate that depends on the configuration of the system, agents coalesce into clusters that also fragment into their individual constituent membership. We consider the large-scale, long-term behaviour of this system in a similar spirit to recent use of such models to characterise the evolution of terrorist cells. Under appropriate assumptions we find an unusual behaviour; the system displays stabilisation with clusters that only contain an odd number of individuals.

Read More…
- Oct 26, 2018
- post
Object Oriented Data Analysis with Application to Neuroimaging Studies

Dehan Kong · Oct 26, 2018
Date: 2018-10-26

Time: 15:30-16:30

Location: BURN 1104

Abstract:

In this talk, I will first briefly introduce my research on object oriented data analysis with application to neuroimaging studies. I will then talk about a detailed example on imaging genetics. In this project, we develop a high-dimensional matrix linear regression model to correlate 2D imaging responses with high-dimensional genetic covariates. We propose a fast and efficient screening procedure based on the spectral norm to deal with the case that the dimension of scalar covariates is much larger than the sample size. We develop an efficient estimation procedure based on the nuclear norm regularization, which explicitly borrows the matrix structure of coefficient matrices. We examine the finite-sample performance of our methods using simulations and a large-scale imaging genetic dataset from the Alzheimer’s Disease Neuroimaging Initiative study.

Read More…
- Oct 19, 2018
- post
Multilevel clustering and optimal transport

Long Nguyen · Oct 19, 2018
Date: 2018-10-19

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Optimal transport plays an increasingly relevant and useful role in the theory and application of mixture model based clustering and inference. In this talk I will describe some recent progress in characterizing the convergence behavior of mixing distributions when one fits a mixture model to the data. This theory hinges on the relationship between the space of mixture densities, which is endowed with variational or Hellinger distance, and the space of mixing measures endowed with optimal transport distance metrics. Next, I will introduce an optimal transport based technique for the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Some theoretical and experimental results will be presented.

Read More…
- Oct 12, 2018
- post
Robust estimation in the presence of influential units for skewed finite and infinite populations

David Haziza · Oct 12, 2018
Date: 2018-10-12

Time: 16:00-

Location: CRM, Université de Montréal, Pavillon André-Aisenstadt, salle 6254

Abstract:

Many variables encountered in practice (e.g., economic variables) have skewed distributions. The latter provide a conducive ground for the presence of influential observations, which are those that have a drastic impact on the estimates if they were to be excluded from the sample. We examine the problem of influential observations in a classical statistic setting as well as in a finite population setting that includes two main frameworks: the design-based framework and the model-based framework. Within each setting, classical estimators may be highly unstable in the presence of influential units. We propose a robust estimator of the population mean based on the concept of conditional bias of a unit, which is a measure of influence. The idea is to reduce the impact of the sample units that have a large conditional bias. The proposed estimator depends on a cut-off value. We suggest selecting the cut-off value that minimizes the maximum absolute estimated conditional bias with respect to the robust estimator. The properties of the proposed estimator will be discussed. Finally, the results of a simulation study comparing the performance of several estimators in terms of bias and mean square error will be presented.

Read More…
- Oct 5, 2018
- post
Dimension Reduction for Causal Inference

Yeying Zhu · Oct 5, 2018
Date: 2018-10-05

Time: 15:30-16:30

Location: BURN 1104

Abstract:

In this talk, we discuss how sufficient dimension reduction can be used to aid causal inference. We propose a new matching approach based on the reduced covariates obtained from sufficient dimension reduction. Compared with the original covariates and the propensity scores, which are commonly used for matching in the literature, the reduced covariates are estimable nonparametrically and are effective in imputing the missing potential outcomes. Under the ignorability assumption, the consistency of the proposed approach requires a weaker common support condition than the one we often assume for propensity score-based methods. We develop asymptotic properties, and conduct simulation studies as well as real data analysis to illustrate the proposed approach.

Read More…
- Sep 28, 2018
- post
Selective inference for dynamic treatment regimes via the LASSO

Ashkan Ertefaie · Sep 28, 2018
Date: 2018-09-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Constructing an optimal dynamic treatment regime become complex when there are large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history over time. Existing methods only focus on selecting the important variables for the decision-making process and fall short in providing inference for the selected model. We fill this gap by leveraging the conditional selective inference methodology. We show that the proposed method is asymptotically valid given certain rate assumptions in semiparametric regression.

Read More…

Date: 2019-01-11

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2018-11-23

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-11-16

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-11-09

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-11-02

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-10-26

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-10-19

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-10-12

Time: 16:00-

Location: CRM, Université de Montréal, Pavillon André-Aisenstadt, salle 6254

Abstract:

Date: 2018-10-05

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-09-28

Time: 15:30-16:30

Location: BURN 1205

Abstract: