McGill Statistics Seminar - McGill Statistics Seminars

- Feb 8, 2013
- post
Celia Greenwood: Multiple testing and region-based tests of rare genetic variation

Celia Greenwood · Feb 8, 2013
Date: 2013-02-08

Time: 14:30-15:30

Location: BURN 1205

Abstract:

In the context of univariate association tests between a trait of interest and common genetic variants (SNPs) across the whole genome, corrections for multiple testing have been well-studied. Due to the patterns of correlation (i.e. linkage disequilibrium), the number of independent tests remains close to 1 million, even when many more common genetic markers are available. With the advent of the DNA sequencing era, however, newly-identified genetic variants tend to be rare or even unique, and consequently single-variant tests of association have little power. As a result, region-based tests of association are being developed that examine associations between the trait and all the genetic variability in a small pre-defined region of the genome. However, coping with multiple testing in this situation has had little attention. I will discuss two aspects of multiple testing for region-based tests. First, I will describe a method for estimating the effective number of independent tests, and second, I will discuss an approach for controlling type I error that is based stratified false discovery rates, where strata are defined by external information such as genomic annotation.

Read More…
- Feb 1, 2013
- post
Daniela Witten: Structured learning of multiple Gaussian graphical models

Daniela Witten · Feb 1, 2013
Date: 2013-02-01

Time: 14:30-15:30

Location: BURN 1205

Abstract:

I will consider the task of estimating high-dimensional Gaussian graphical models (or networks) corresponding to a single set of features under several distinct conditions. In other words, I wish to estimate several distinct but related networks. I assume that most aspects of the networks are shared, but that there are some structured differences between them. The goal is to exploit the similarity among the networks in order to obtain more accurate estimates of each individual network, as well as to identify the differences between the networks.

Read More…
- Jan 25, 2013
- post
Mylène Bédard: On the empirical efficiency of local MCMC algorithms with pools of proposals

Mylène Bédard · Jan 25, 2013
Date: 2013-01-25

Time: 14:30-15:30

Location: BURN 1205

Abstract:

In an attempt to improve on the Metropolis algorithm, various MCMC methods with auxiliary variables, such as the multiple-try and delayed rejection Metropolis algorithms, have been proposed. These methods generate several candidates in a single iteration; accordingly they are computationally more intensive than the Metropolis algorithm. It is usually difficult to provide a general estimate for the computational cost of a method without being overly conservative; potentially efficient methods could thus be overlooked by relying on such estimates. In this talk, we describe three algorithms with auxiliary variables - the multiple-try Metropolis (MTM) algorithm, the multiple-try Metropolis hit-and-run (MTM-HR) algorithm, and the delayed rejection Metropolis algorithm with antithetic proposals (DR-A) - and investigate the net performance of these algorithms in various contexts. To allow for a fair comparison, the study is carried under optimal mixing conditions for each of these algorithms. The DR-A algorithm, whose proposal scheme introduces correlation in the pool of candidates, seems particularly promising. The algorithms are used in the contexts of Bayesian logistic regressions and classical inference for a linear regression model. This talk is based on work in collaboration with M. Mireuta, E. Moulines, and R. Douc.

Read More…
- Jan 11, 2013
- post
Ana Best: Risk-set sampling, left truncation, and Bayesian methods in survival analysis

Ana Best · Jan 11, 2013
Date: 2013-01-11

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Statisticians are often faced with budget concerns when conducting studies. The collection of some covariates, such as genetic data, is very expensive. Other covariates, such as detailed histories, might be difficult or time-consuming to measure. This helped bring about the invention of the nested case-control study, and its more generalized version, risk-set sampled survival analysis. The literature has a good discussion of the properties of risk-set sampling in standard right-censored survival data. My interest is in extending the methods of risk-set sampling to left-truncated survival data, which arise in prevalent longitudinal studies. Since prevalent studies are easier and cheaper to conduct than incident studies, this extension is extremely practical and relevant. I will introduce the partial likelihood in this scenario, and briefly discuss the asymptotic properties of my estimator. I will also introduce Bayesian methods for standard survival analysis, and discuss methods for analyzing risk-set-sampled survival data using Bayesian methods.

Read More…
- Dec 7, 2012
- post
Sample size and power determination for multiple comparison procedures aiming at rejecting at least r among m false hypotheses

Pierre Lafaye de Micheaux · Dec 7, 2012
Date: 2012-12-07

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Multiple testing problems arise in a variety of situations, notably in clinical trials with multiple endpoints. In such cases, it is often of interest to reject either all hypotheses or at least one of them. More generally, the question arises as to whether one can reject at least r out of m hypotheses. Statistical tools addressing this issue are rare in the literature. In this talk, I will recall well-known hypothesis testing concepts, both in a single- and in a multiple-hypothesis context. I will then present general power formulas for three important multiple comparison procedures: the Bonferroni and Hochberg procedures, as well as Holm’s sequential procedure. Next, I will describe an R package that we developed for sample size calculations in multiple endpoints trials where it is desired to reject at least r out of m hypotheses. This package covers the case where all the variables are continuous and four common variance-covariance patterns. I will show how to use this package to compute the sample size needed in a real-life application.

Read More…
- Nov 30, 2012
- post
Sharing confidential datasets using differential privacy

Anne-Sophie Charest · Nov 30, 2012
Date: 2012-11-30

Time: 14:30-15:30

Location: BURN 1205

Abstract:

While statistical agencies would like to share their data with researchers, they must also protect the confidentiality of the data provided by their respondents. To satisfy these two conflicting objectives, agencies use various techniques to restrict and modify the data before publication. Most of these techniques however share a common flaw: their confidentiality protection can not be rigorously measured. In this talk, I will present the criterion of differential privacy, a rigorous measure of the protection offered by such methods. Designed to guarantee confidentiality even in a worst-case scenario, differential privacy protects the information of any individual in the database against an adversary with complete knowledge of the rest of the dataset. I will first give a brief overview of recent and current research on the topic of differential privacy. I will then focus on the publication of differentially-private synthetic contingency tables and present some of my results on the methods for the generation and proper analysis of such datasets.

Read More…
- Nov 16, 2012
- post
Copula-based regression estimation and Inference

Taoufik Bouezmarni · Nov 16, 2012
Date: 2012-11-16

Time: 14:30-15:30

Location: BURN 1205

Abstract:

In this paper we investigate a new approach of estimating a regression function based on copulas. The main idea behind this approach is to write the regression function in terms of a copula and marginal distributions. Once the copula and the marginal distributions are estimated we use the plug-in method to construct the new estimator. Because various methods are available in the literature for estimating both a copula and a distribution, this idea provides a rich and flexible alternative to many existing regression estimators. We provide some asymptotic results related to this copula-based regression modeling when the copula is estimated via profile likelihood and the marginals are estimated nonparametrically. We also study the finite sample performance of the estimator and illustrate its usefulness by analyzing data from air pollution studies.

Read More…
- Nov 9, 2012
- post
The multidimensional edge: Seeking hidden risks

Sidney Resnick · Nov 9, 2012
Date: 2012-11-09

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Assessing tail risks using the asymptotic models provided by multivariate extreme value theory has the danger that when asymptotic independence is present (as with the Gaussian copula model), the asymptotic model provides estimates of probabilities of joint tail regions that are zero. In diverse applications such as finance, telecommunications, insurance and environmental science, it may be difficult to believe in the absence of risk contagion. This problem can be partly ameliorated by using hidden regular variation which assumes a lower order asymptotic behavior on a subcone of the state space and this theory can be made more flexible by extensions in the following directions: (i) higher dimensions than two; (ii) where the lower order variation on a subcone is of extreme value type different from regular variation; and (iii) where the concept is extended to searching for lower order behavior on the complement of the support of the limit measure of regular variation. We discuss some challenges and potential applications to this ongoing effort.

Read More…
- Nov 2, 2012
- post
Multivariate extremal dependence: Estimation with bias correction

Anne-Laure Fougères · Nov 2, 2012
Date: 2012-11-02

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Estimating extreme risks in a multivariate framework is highly connected with the estimation of the extremal dependence structure. This structure can be described via the stable tail dependence function L, for which several estimators have been introduced. Asymptotic normality is available for empirical estimates of L, with rate of convergence k^1/2, where k denotes the number of high order statistics used in the estimation. Choosing a higher k might be interesting for an improved accuracy of the estimation, but may lead to an increased asymptotic bias. We provide a bias correction procedure for the estimation of L. Combining estimators of L is done in such a way that the asymptotic bias term disappears. The new estimator of L is shown to allow more flexibility in the choice of k. Its asymptotic behavior is examined, and a simulation study is provided to assess its small sample behavior. This is a joint work with Cécile Mercadier (Université Lyon 1) and Laurens de Haan (Erasmus University Rotterdam).

Read More…
- Oct 26, 2012
- post
Simulation model calibration and prediction using outputs from multi-fidelity simulators

Derek Bingham · Oct 26, 2012
Date: 2012-10-26

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Computer simulators are used widely to describe physical processes in lieu of physical observations. In some cases, more than one computer code can be used to explore the same physical system - each with different degrees of fidelity. In this work, we combine field observations and model runs from deterministic multi-fidelity computer simulators to build a predictive model for the real process. The resulting model can be used to perform sensitivity analysis for the system and make predictions with associated measures of uncertainty. Our approach is Bayesian and will be illustrated through a simple example, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan.

Read More…

Date: 2013-02-08

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2013-02-01

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2013-01-25

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2013-01-11

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2012-12-07

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2012-11-30

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2012-11-16

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2012-11-09

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2012-11-02

Time: 14:30-15:30

Location: BURN 1205

Abstract:

Date: 2012-10-26

Time: 14:30-15:30

Location: BURN 1205

Abstract: