Past Seminar Series - McGill Statistics Seminars

- Dec 1, 2016
- post
High-dimensional changepoint estimation via sparse projection

Richard Samworth · Dec 1, 2016
Date: 2016-12-01

Time: 15:30-16:30

Location: BURN 708

Abstract:

Changepoints are a very common feature of Big Data that arrive in the form of a data stream. We study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called ‘inspect’ for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint detection algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms.

Read More…
- Nov 25, 2016
- post
Spatio-temporal models for skewed processes

Alexandra Schmidt · Nov 25, 2016
Date: 2016-11-25

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In the analysis of most spatio-temporal processes in environmental studies, observations present skewed distributions. Usually, a single transformation of the data is used to approximate normality, and stationary Gaussian processes are assumed to model the transformed data. The choice of transformation is key for spatial interpolation and temporal prediction. We propose a spatio-temporal model for skewed data that does not require the use of data transformation. The process is decomposed as the sum of a purely temporal structure with two independent components that are considered to be partial realizations from independent spatial Gaussian processes, for each time t. The model has an asymmetry parameter that might vary with location and time, and if this is equal to zero, the usual Gaussian model results. The inference procedure is performed under the Bayesian paradigm, and uncertainty about parameters estimation is naturally accounted for. We fit our model to different synthetic data and to monthly average temperature observed between 2001 and 2011 at monitoring locations located in the south of Brazil. Different model comparison criteria, and analysis of the posterior distribution of some parameters, suggest that the proposed model outperforms standard ones used in the literature. This is joint work with Kelly Gonçalves (UFRJ, Brazil) and Patricia L. Velozo (UFF, Brazil)

Read More…
- Nov 18, 2016
- post
Progress in theoretical understanding of deep learning

Yoshua Bengio · Nov 18, 2016
Date: 2016-11-18

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Deep learning has arisen around 2006 as a renewal of neural networks research allowing such models to have more layers. Theoretical investigations have shown that functions obtained as deep compositions of simpler functions (which includes both deep and recurrent nets) can express highly varying functions (with many ups and downs and different input regions that can be distinguished) much more efficiently (with fewer parameters) than otherwise, under a prior which seems to work well for artificial intelligence tasks. Empirical work in a variety of applications has demonstrated that, when well trained, such deep architectures can be highly successful, remarkably breaking through previous state-of-the-art in many areas, including speech recognition, object recognition, language models, machine translation and transfer learning. Although neural networks have long been considered lacking in theory and much remains to be done, theoretical advances have been made and will be discussed, to support distributed representations, depth of representation, the non-convexity of the training objective, and the probabilistic interpretation of learning algorithms (especially of the auto-encoder type, which were lacking one). The talk will focus on the intuitions behind these theoretical results.

Read More…
- Nov 11, 2016
- post
Tyler's M-estimator: Subspace recovery and high-dimensional regime

Teng Zhang · Nov 11, 2016
Date: 2016-11-11

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Given a data set, Tyler’s M-estimator is a widely used covariance matrix estimator with robustness to outliers or heavy-tailed distribution. We will discuss two recent results of this estimator. First, we show that when a certain percentage of the data points are sampled from a low-dimensional subspace, Tyler’s M-estimator can be used to recover the subspace exactly. Second, in the high-dimensional regime that the number of samples n and the dimension p both go to infinity, p/n converges to a constant y between 0 and 1, and when the data samples are identically and independently generated from the Gaussian distribution N(0,I), we showed that the difference between the sample covariance matrix and a scaled version of Tyler’s M-estimator tends to zero in spectral norm, and the empirical spectral densities of both estimators converge to the Marcenko-Pastur distribution. We also prove that when the data samples are generated from an elliptical distribution, the limiting distribution of Tyler’s M-estimator converges to a Marcenko-Pastur-Type distribution. The second part is joint work with Xiuyuan Cheng and Amit Singer.

Read More…
- Nov 4, 2016
- post
Lawlor: Time-varying mixtures of Markov chains: An application to traffic modeling Piché: Bayesian nonparametric modeling of heterogeneous groups of censored data

Sean Lawlor and Alexandre Piché · Nov 4, 2016
Date: 2016-11-04

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Piché: Analysis of survival data arising from different groups, whereby the data in each group is scarce, but abundant overall, is a common issue in applied statistics. Bayesian nonparametrics are tools of choice to handle such datasets given their ability to share information across groups. In this presentation, we will compare three popular Bayesian nonparametric methods on the modeling of survival functions coming from related heterogeneous groups. Specifically, we will first compare the modeling accuracy of the Dirichlet process, the hierarchical Dirichlet process, and the nested Dirichlet process on simulated datasets of different sizes, where groups differ in shape or in expectation, and finally we will compare the models on real world injury datasets.

Read More…
- Nov 2, 2016
- post
First talk: Bootstrap in practice | Second talk: Statistics and Big Data at Google

Tim Hesterberg · Nov 2, 2016
Date: 2016-11-02

Time: 15:00-16:00 17:35-18:25

Location: 1st: BURN 306 2nd: ADAMS AUD

Abstract:

First talk: This talk focuses on three practical aspects of resampling: communication, accuracy, and software. I’ll introduce the bootstrap and permutation tests, and discussed how they may be used to help clients understand statistical results. I’ll talk about accuracy – there are dramatic differences in how accurate different bootstrap methods are. Surprisingly, the most common bootstrap methods are less accurate than classical methods for small samples, and more accurate for larger samples. There are simple variations that dramatically improve the accuracy. Finally, I’ll compare two R packages, the the easy-to-use “resample” package, and the more-powerful “boot” package.

Read More…
- Oct 28, 2016
- post
Efficient tests of covariate effects in two-phase failure time studies

Jerry Lawless · Oct 28, 2016
Date: 2016-10-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Two-phase studies are frequently used when observations on certain variables are expensive or difficult to obtain. One such situation is when a cohort exists for which certain variables have been measured (phase 1 data); then, a sub-sample of individuals is selected, and additional data are collected on them (phase 2). Efficiency for tests and estimators can be increased by basing the selection of phase 2 individuals on data collected at phase 1. For example, in large cohorts, expensive genomic measurements are often collected at phase 2, with oversampling of persons with “extreme” phenotypic responses. A second example is case-cohort or nested case-control studies involving times to rare events, where phase 2 oversamples persons who have experienced the event by a certain time. In this talk I will describe two-phase studies on failure times, present efficient methods for testing covariate effects. Some extensions to more complex outcomes and areas needing further development will be discussed.

Read More…
- Oct 21, 2016
- post
Statistical analysis of two-level hierarchical clustered data

Chien-Lin Su · Oct 21, 2016
Date: 2016-10-21

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Multi-level hierarchical clustered data are commonly seen in financial and biostatistics applications. In this talk, we introduce several modeling strategies for describing the dependent relationships for members within a cluster or between different clusters (in the same or different levels). In particular we will apply the hierarchical Kendall copula, first proposed by Brechmann (2014), to model two-level hierarchical clustered survival data. This approach provides a clever way of dimension reduction in modeling complicated multivariate data. Based on the model assumptions, we propose statistical inference methods, including parameter estimation and a goodness-of-fit test, suitable for handling censored data. Simulation and data analysis results are also presented.

Read More…
- Oct 14, 2016
- post
A Bayesian finite mixture of bivariate regressions model for causal mediation analyses

Geneviève Lefebvre · Oct 14, 2016
Date: 2016-10-14

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Building on the work of Schwartz, Gelfand and Miranda (Statistics in Medicine (2010); 29(16), 1710-23), we propose a Bayesian finite mixture of bivariate regressions model for causal mediation analyses. Using an identifiability condition within each component of the mixture, we express the natural direct and indirect effects of the exposure on the outcome as functions of the component-specific regression coefficients. On the basis of simulated data, we examine the behaviour of the model for estimating these effects in situations where the associations between exposure, mediator and outcome are confounded, or not. Additionally, we demonstrate that this mixture model can be used to account for heterogeneity arising through unmeasured binary mediator-outcome confounders. Finally, we apply our mediation mixture model to estimate the natural direct and indirect effects of exposure to inhaled corticosteroids during pregnancy on birthweight using a cohort of asthmatic women from the province of Québec.

Read More…
- Oct 7, 2016
- post
Cellular tree classifiers

Luc Devroye · Oct 7, 2016
Date: 2016-10-07

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Suppose that binary classification is done by a tree method in which the leaves of a tree correspond to a partition of d-space. Within a partition, a majority vote is used. Suppose furthermore that this tree must be constructed recursively by implementing just two functions, so that the construction can be carried out in parallel by using “cells”: first of all, given input data, a cell must decide whether it will become a leaf or internal node in the tree. Secondly, if it decides on an internal node, it must decide how to partition the space linearly. Data are then split into two parts and sent downstream to two new independent cells. We discuss the design and properties of such classifiers.

Read More…

Date: 2016-12-01

Time: 15:30-16:30

Location: BURN 708

Abstract:

Date: 2016-11-25

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-11-18

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-11-11

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-11-04

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-11-02

Time: 15:00-16:00 17:35-18:25

Location: 1st: BURN 306 2nd: ADAMS AUD

Abstract:

Date: 2016-10-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-10-21

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-10-14

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-10-07

Time: 15:30-16:30

Location: BURN 1205

Abstract: