McGill Statistics Seminar - McGill Statistics Seminars

- Nov 9, 2018
- post
Density estimation of mixtures of Gaussians and Ising models

Abbas Mehrabian · Nov 9, 2018
Date: 2018-11-09

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Density estimation lies at the intersection of statistics, theoretical computer science, and machine learning. We review some old and new results on the sample complexities (also known as minimax convergence rates) of estimating densities of high-dimensional distributions, in particular mixtures of Gaussians and Ising models.

Based on joint work with Hassan Ashtiani, Shai Ben-David, Luc Devroye, Nick Harvey, Christopher Liaw, Yani Plan, and Tommy Reddad.

Read More…
- Nov 2, 2018
- post
Terrorists never congregate in even numbers (and other strange results in fragmentation-coalescence)

Andreas Kyprianou · Nov 2, 2018
Date: 2018-11-02

Time: 15:30-16:30

Location: BURN 1104

Abstract:

The rigorous mathematical treatment of random fragmentation-coalescent models in the literature is difficult to find, and perhaps for good reason. We examine two different types of random fragmentation-coalescent models which produce somewhat unexpected results.

The first concerns an agent-based model in which, with a rate that depends on the configuration of the system, agents coalesce into clusters that also fragment into their individual constituent membership. We consider the large-scale, long-term behaviour of this system in a similar spirit to recent use of such models to characterise the evolution of terrorist cells. Under appropriate assumptions we find an unusual behaviour; the system displays stabilisation with clusters that only contain an odd number of individuals.

Read More…
- Oct 26, 2018
- post
Object Oriented Data Analysis with Application to Neuroimaging Studies

Dehan Kong · Oct 26, 2018
Date: 2018-10-26

Time: 15:30-16:30

Location: BURN 1104

Abstract:

In this talk, I will first briefly introduce my research on object oriented data analysis with application to neuroimaging studies. I will then talk about a detailed example on imaging genetics. In this project, we develop a high-dimensional matrix linear regression model to correlate 2D imaging responses with high-dimensional genetic covariates. We propose a fast and efficient screening procedure based on the spectral norm to deal with the case that the dimension of scalar covariates is much larger than the sample size. We develop an efficient estimation procedure based on the nuclear norm regularization, which explicitly borrows the matrix structure of coefficient matrices. We examine the finite-sample performance of our methods using simulations and a large-scale imaging genetic dataset from the Alzheimer’s Disease Neuroimaging Initiative study.

Read More…
- Oct 19, 2018
- post
Multilevel clustering and optimal transport

Long Nguyen · Oct 19, 2018
Date: 2018-10-19

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Optimal transport plays an increasingly relevant and useful role in the theory and application of mixture model based clustering and inference. In this talk I will describe some recent progress in characterizing the convergence behavior of mixing distributions when one fits a mixture model to the data. This theory hinges on the relationship between the space of mixture densities, which is endowed with variational or Hellinger distance, and the space of mixing measures endowed with optimal transport distance metrics. Next, I will introduce an optimal transport based technique for the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Some theoretical and experimental results will be presented.

Read More…
- Oct 5, 2018
- post
Dimension Reduction for Causal Inference

Yeying Zhu · Oct 5, 2018
Date: 2018-10-05

Time: 15:30-16:30

Location: BURN 1104

Abstract:

In this talk, we discuss how sufficient dimension reduction can be used to aid causal inference. We propose a new matching approach based on the reduced covariates obtained from sufficient dimension reduction. Compared with the original covariates and the propensity scores, which are commonly used for matching in the literature, the reduced covariates are estimable nonparametrically and are effective in imputing the missing potential outcomes. Under the ignorability assumption, the consistency of the proposed approach requires a weaker common support condition than the one we often assume for propensity score-based methods. We develop asymptotic properties, and conduct simulation studies as well as real data analysis to illustrate the proposed approach.

Read More…
- Sep 28, 2018
- post
Selective inference for dynamic treatment regimes via the LASSO

Ashkan Ertefaie · Sep 28, 2018
Date: 2018-09-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Constructing an optimal dynamic treatment regime become complex when there are large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history over time. Existing methods only focus on selecting the important variables for the decision-making process and fall short in providing inference for the selected model. We fill this gap by leveraging the conditional selective inference methodology. We show that the proposed method is asymptotically valid given certain rate assumptions in semiparametric regression.

Read More…
- Sep 21, 2018
- post
Possession Sketches: Mapping NBA Strategies

Luke Bornn · Sep 21, 2018
Date: 2018-09-21

Time: 09:30-10:15

Location: Bronfman Building 001

Abstract:

We present Possession Sketches, a new machine learning method for organizing and exploring a database of basketball player-tracks. Our method organizes basketball possessions by offensive structure. We first develop a model for populating a dictionary of short, repeated, and spatially registered actions. Each action corresponds to an interpretable type of player movement. We examine statistical patterns in these actions, and show how they can be used to describe individual player behavior. Leveraging this vocabulary of actions, we develop a hierarchical model that describes interactions between players. Our approach draws on the topic-modeling literature, extending Latent Dirichlet Allocation (LDA) through a novel representation of player movement data which uses techniques common in animation and video game design. We show that our model is able to group together possessions with similar offensive structure, allowing for efficient search and exploration of the entire database of player-tracking data. We show that our model finds repeated offensive structure in teams (e.g. strategy), providing a much more sophisticated, yet interpretable lens into basketball player-tracking data. This is joint work with Andrew Miller.

Read More…
- Sep 14, 2018
- post
Quantile LASSO in Nonparametric Models with Changepoints Under Optional Shape Constraints

Matus Maciak · Sep 14, 2018
Date: 2018-09-14

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Nonparametric models are popular modeling tools because of their natural overall flexibility. In our approach, we apply nonparametric techniques for panel data structures with changepoints and optional shape constraints and the estimation is performed in a fully data driven manner by utilizing atomic pursuit methods – LASSO regularization techniques in particular. However, in order to obtain robust estimates and, also, to have a more complex insight into the underlying data structure, we target conditional quantiles rather then the conditional mean only. The whole estimation process and the following inference become both more challenging but the results are more useful in practical applications. The underlying model is firstly introduced and some theoretical results are presented. The proposed methodology is applied for a real data scenario and some finite sample properties are investigated via an extensive simulation study. This is a joint work with Ivan Mizera, University of Alberta and Gabriela Ciuperca, University of Lyon

Read More…
- Sep 7, 2018
- post
Association Measures for Clustered Competing Risks Data

Chien-Lin (Mark) Su · Sep 7, 2018
Date: 2018-09-07

Time: 15:30-16:30

Location: BURN 1104

Abstract:

In this work, we propose a semiparametric model for multivariate clustered competing risks data when the cause-specific failure times and the occurrence of competing risk events among subjects within the same cluster are of interest. The cause-specific hazard functions are assumed to follow Cox proportional hazard models, and the associations between failure times given the same or different cause events and the associations between occurrences of competing risk events within the same cluster are investigated through copula models. A cross-odds ratio measure is explored under our proposed models. Two-stage estimation procedure is proposed in which the marginal models are estimated in the first stage, and the dependence parameters are estimated via an Expectation-Maximization algorithm in the second stage. The proposed estimators are shown to yield consistent and asymptotically normal under mild regularity conditions. Simulation studies are conducted to assess finite sample performance of the proposed method. The proposed technique is demonstrated through an application to a multicenter Bone Marrow transplantation dataset.

Read More…
- Apr 27, 2018
- post
Methodological challenges in using point-prevalence versus cohort data in risk factor analyses of hospital-acquired infections

Martin Wolkewitz · Apr 27, 2018
Date: 2018-04-27

Time: 15:30-16:30

Location: BURN 1205

Abstract:

To explore the impact of length-biased sampling on the evaluation of risk factors of nosocomial infections in point-prevalence studies. We used cohort data with full information including the exact date of the nosocomial infection and mimicked an artificial one-day prevalence study by picking a sample from this cohort study. Based on the cohort data, we studied the underlying multi-state model which accounts for nosocomial infection as an intermediate and discharge/death as competing events. Simple formulas are derived to display relationships between risk-, hazard- and prevalence odds ratios. Due to length-biased sampling, long-stay and thus sicker patients are more likely to be sampled. In addition, patients with nosocomial infections usually stay longer in hospital. We explored mechanisms which are -due to the design- hidden in prevalence data. In our example, we showed that prevalence odds ratios were usually less pronounced than risk odds ratios but more pronounced than hazard ratios. Thus, to avoid misinterpretation, knowledge of the mechanisms from the underlying multi-state model are essential for the interpretation of risk factors derived from point-prevalence data.

Read More…

Date: 2018-11-09

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-11-02

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-10-26

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-10-19

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-10-05

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-09-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2018-09-21

Time: 09:30-10:15

Location: Bronfman Building 001

Abstract:

Date: 2018-09-14

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-09-07

Time: 15:30-16:30

Location: BURN 1104

Abstract:

Date: 2018-04-27

Time: 15:30-16:30

Location: BURN 1205

Abstract: