Past Seminar Series - McGill Statistics Seminars

- Nov 2, 2016
- post
First talk: Bootstrap in practice | Second talk: Statistics and Big Data at Google

Tim Hesterberg · Nov 2, 2016
Date: 2016-11-02

Time: 15:00-16:00 17:35-18:25

Location: 1st: BURN 306 2nd: ADAMS AUD

Abstract:

First talk: This talk focuses on three practical aspects of resampling: communication, accuracy, and software. I’ll introduce the bootstrap and permutation tests, and discussed how they may be used to help clients understand statistical results. I’ll talk about accuracy – there are dramatic differences in how accurate different bootstrap methods are. Surprisingly, the most common bootstrap methods are less accurate than classical methods for small samples, and more accurate for larger samples. There are simple variations that dramatically improve the accuracy. Finally, I’ll compare two R packages, the the easy-to-use “resample” package, and the more-powerful “boot” package.

Read More…
- Oct 28, 2016
- post
Efficient tests of covariate effects in two-phase failure time studies

Jerry Lawless · Oct 28, 2016
Date: 2016-10-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Two-phase studies are frequently used when observations on certain variables are expensive or difficult to obtain. One such situation is when a cohort exists for which certain variables have been measured (phase 1 data); then, a sub-sample of individuals is selected, and additional data are collected on them (phase 2). Efficiency for tests and estimators can be increased by basing the selection of phase 2 individuals on data collected at phase 1. For example, in large cohorts, expensive genomic measurements are often collected at phase 2, with oversampling of persons with “extreme” phenotypic responses. A second example is case-cohort or nested case-control studies involving times to rare events, where phase 2 oversamples persons who have experienced the event by a certain time. In this talk I will describe two-phase studies on failure times, present efficient methods for testing covariate effects. Some extensions to more complex outcomes and areas needing further development will be discussed.

Read More…
- Oct 21, 2016
- post
Statistical analysis of two-level hierarchical clustered data

Chien-Lin Su · Oct 21, 2016
Date: 2016-10-21

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Multi-level hierarchical clustered data are commonly seen in financial and biostatistics applications. In this talk, we introduce several modeling strategies for describing the dependent relationships for members within a cluster or between different clusters (in the same or different levels). In particular we will apply the hierarchical Kendall copula, first proposed by Brechmann (2014), to model two-level hierarchical clustered survival data. This approach provides a clever way of dimension reduction in modeling complicated multivariate data. Based on the model assumptions, we propose statistical inference methods, including parameter estimation and a goodness-of-fit test, suitable for handling censored data. Simulation and data analysis results are also presented.

Read More…
- Oct 14, 2016
- post
A Bayesian finite mixture of bivariate regressions model for causal mediation analyses

Geneviève Lefebvre · Oct 14, 2016
Date: 2016-10-14

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Building on the work of Schwartz, Gelfand and Miranda (Statistics in Medicine (2010); 29(16), 1710-23), we propose a Bayesian finite mixture of bivariate regressions model for causal mediation analyses. Using an identifiability condition within each component of the mixture, we express the natural direct and indirect effects of the exposure on the outcome as functions of the component-specific regression coefficients. On the basis of simulated data, we examine the behaviour of the model for estimating these effects in situations where the associations between exposure, mediator and outcome are confounded, or not. Additionally, we demonstrate that this mixture model can be used to account for heterogeneity arising through unmeasured binary mediator-outcome confounders. Finally, we apply our mediation mixture model to estimate the natural direct and indirect effects of exposure to inhaled corticosteroids during pregnancy on birthweight using a cohort of asthmatic women from the province of Québec.

Read More…
- Oct 7, 2016
- post
Cellular tree classifiers

Luc Devroye · Oct 7, 2016
Date: 2016-10-07

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Suppose that binary classification is done by a tree method in which the leaves of a tree correspond to a partition of d-space. Within a partition, a majority vote is used. Suppose furthermore that this tree must be constructed recursively by implementing just two functions, so that the construction can be carried out in parallel by using “cells”: first of all, given input data, a cell must decide whether it will become a leaf or internal node in the tree. Secondly, if it decides on an internal node, it must decide how to partition the space linearly. Data are then split into two parts and sent downstream to two new independent cells. We discuss the design and properties of such classifiers.

Read More…
- Sep 30, 2016
- post
CoCoLasso for high-dimensional error-in-variables regression

Hui Zou · Sep 30, 2016
Date: 2016-09-30

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright (2012) proposed a non-convex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets including the cases of additive measurement error and random missing data. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel corrected cross-validation technique by using the basic idea in CoCoLasso. The corrected cross-validation has its own importance. We demonstrate the superior performance of our method over the non-convex approach by simulation studies.

Read More…
- Sep 23, 2016
- post
Stein estimation of the intensity parameter of a stationary spatial Poisson point process

Jean-François Coeurjolly · Sep 23, 2016
Date: 2016-09-23

Time: 15:30-16:30

Location: BURN 1205

Abstract:

We revisit the problem of estimating the intensity parameter of a homogeneous Poisson point process observed in a bounded window of $R^d$ making use of a (now) old idea going back to James and Stein. For this, we prove an integration by parts formula for functionals defined on the Poisson space. This formula extends the one obtained by Privault and Réveillac (Statistical inference for Stochastic Processes, 2009) in the one-dimensional case and is well-suited to a notion of derivative of Poisson functionals which satisfy the chain rule. The new estimators can be viewed as biased versions of the MLE with a tailored-made bias designed to reduce the variance of the MLE. We study a large class of examples and show that with a controlled probability the corresponding estimator outperforms the MLE. We illustrate in a simulation study that for very reasonable practical cases (like an intensity of 10 or 20 of a Poisson point process observed in the d-dimensional euclidean ball of with d = 1, …, 5), we can obtain a relative (mean squared error) gain above 20% for the Stein estimator with respect to the maximum likelihood. This is a joint work with M. Clausel and J. Lelong (Univ. Grenoble Alpes, France).

Read More…
- Sep 16, 2016
- post
Statistical inference for fractional diffusion processes

Prakasa Rao · Sep 16, 2016
Date: 2016-09-16

Time: 16:00-17:00

Location: LB-921.04, Library Building, Concordia Univ.

Abstract:

There are some time series which exhibit long-range dependence as noticed by Hurst in his investigations of river water levels along Nile river. Long-range dependence is connected with the concept of self-similarity in that increments of a self-similar process with stationary increments exhibit long-range dependence under some conditions. Fractional Brownian motion is an example of such a process. We discuss statistical inference for stochastic processes modeled by stochastic differential equations driven by a fractional Brownian motion. These processes are termed as fractional diffusion processes. Since fractional Brownian motion is not a semimartingale, it is not possible to extend the notion of a stochastic integral with respect to a fractional Brownian motion following the ideas of Ito integration. There are other methods of extending integration with respect to a fractional Brownian motion. Suppose a complete path of a fractional diffusion process is observed over a finite time interval. We will present some results on inference problems for such processes.

Read More…
- Sep 9, 2016
- post
Two-set canonical variate model in multiple populations with invariant loadings

Fei Gu · Sep 9, 2016
Date: 2016-09-09

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Goria and Flury (Definition 2.1, 1996) proposed the two-set canonical variate model (referred to as the CV-2 model hereafter) and its extension in multiple populations with invariant weight coefficients (Definition 2.2). The equality constraints imposed on the weight coefficients are in line with the approach to interpreting the canonical variates (i.e., the linear combinations of original variables) advocated by Harris (1975, 1989), Rencher (1988, 1992), and Rencher and Christensen (2003). However, the literature in psychology and education shows that the standard approach adopted by most researchers, including Anderson (2003), is to use the canonical loadings (i.e., the correlations between the canonical variates and the original variables in the same set) to interpret the canonical variates. In case of multicollinearity (giving rise to the so-called suppression effects) among the original variables, it is not uncommon to obtain different interpretations from the two approaches. Therefore, following the standard approach in practice, an alternative (probably more realistic) extension of Goria and Flury’s CV-2 model in multiple populations is to impose the equality constraints on the canonical loadings. The utility of this multiple-population extension are illustrated with two numeric examples.

Read More…
- Apr 8, 2016
- post
Multivariate tests of associations based on univariate tests

Ruth Heller · Apr 8, 2016
Date: 2016-04-08

Time: 15:30-16:30

Location: BURN 1205

Abstract:

For testing two random vectors for independence, we consider testing whether the distance of one vector from an arbitrary center point is independent from the distance of the other vector from an arbitrary center point by a univariate test. We provide conditions under which it is enough to have a consistent univariate test of independence on the distances to guarantee that the power to detect dependence between the random vectors increases to one, as the sample size increases. These conditions turn out to be minimal. If the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center points and aggregate the center-specific univariate tests, the power may be further improved. We suggest a specific aggregation method for which the resulting multivariate test will be distribution-free if the univariate test is distribution-free. We show that several multivariate tests recently proposed in the literature can be viewed as instances of this general approach.

Read More…

Date: 2016-11-02

Time: 15:00-16:00 17:35-18:25

Location: 1st: BURN 306 2nd: ADAMS AUD

Abstract:

Date: 2016-10-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-10-21

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-10-14

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-10-07

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-09-30

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-09-23

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-09-16

Time: 16:00-17:00

Location: LB-921.04, Library Building, Concordia Univ.

Abstract:

Date: 2016-09-09

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2016-04-08

Time: 15:30-16:30

Location: BURN 1205

Abstract: