/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Natalia Stepanova: On asymptotic efficiency of some nonparametric tests for testing multivariate independence

    Date: 2013-03-01

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Some problems of statistics can be reduced to extremal problems of minimizing functionals of smooth functions defined on the cube $[0,1]^m$, $m\geq 2$. In this talk, we consider a class of extremal problems that is closely connected to the problem of testing multivariate independence. By solving the extremal problem, we provide a unified approach to establishing weak convergence for a wide class of empirical processes which emerge in connection with testing multivariate independence. The use of our result will be also illustrated by describing the domain of local asymptotic optimality of some nonparametric tests of independence.

  • Eric Cormier: Data Driven Nonparametric Inference for Bivariate Extreme-Value Copulas

    Date: 2013-02-15

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    It is often crucial to know whether the dependence structure of a bivariate distribution belongs to the class of extreme-­‐value copulas. In this talk, I will describe a graphical tool that allows judgment regarding the existence of extreme-­‐value dependence. I will also present a data-­‐ driven nonparametric estimator of the Pickands dependence function. This estimator, which is constructed from constrained b-­‐splines, is intrinsic and differentiable, thereby enabling sampling from the fitted model. I will illustrate its properties via simulation. This will lead me to highlight some of the limitations associated with currently available tests of extremeness.

  • Celia Greenwood: Multiple testing and region-based tests of rare genetic variation

    Date: 2013-02-08

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In the context of univariate association tests between a trait of interest and common genetic variants (SNPs) across the whole genome, corrections for multiple testing have been well-studied. Due to the patterns of correlation (i.e. linkage disequilibrium), the number of independent tests remains close to 1 million, even when many more common genetic markers are available. With the advent of the DNA sequencing era, however, newly-identified genetic variants tend to be rare or even unique, and consequently single-variant tests of association have little power. As a result, region-based tests of association are being developed that examine associations between the trait and all the genetic variability in a small pre-defined region of the genome. However, coping with multiple testing in this situation has had little attention. I will discuss two aspects of multiple testing for region-based tests. First, I will describe a method for estimating the effective number of independent tests, and second, I will discuss an approach for controlling type I error that is based stratified false discovery rates, where strata are defined by external information such as genomic annotation.

  • Daniela Witten: Structured learning of multiple Gaussian graphical models

    Date: 2013-02-01

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    I will consider the task of estimating high-dimensional Gaussian graphical models (or networks) corresponding to a single set of features under several distinct conditions. In other words, I wish to estimate several distinct but related networks. I assume that most aspects of the networks are shared, but that there are some structured differences between them. The goal is to exploit the similarity among the networks in order to obtain more accurate estimates of each individual network, as well as to identify the differences between the networks.

  • Mylène Bédard: On the empirical efficiency of local MCMC algorithms with pools of proposals

    Date: 2013-01-25

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In an attempt to improve on the Metropolis algorithm, various MCMC methods with auxiliary variables, such as the multiple-try and delayed rejection Metropolis algorithms, have been proposed. These methods generate several candidates in a single iteration; accordingly they are computationally more intensive than the Metropolis algorithm. It is usually difficult to provide a general estimate for the computational cost of a method without being overly conservative; potentially efficient methods could thus be overlooked by relying on such estimates. In this talk, we describe three algorithms with auxiliary variables - the multiple-try Metropolis (MTM) algorithm, the multiple-try Metropolis hit-and-run (MTM-HR) algorithm, and the delayed rejection Metropolis algorithm with antithetic proposals (DR-A) - and investigate the net performance of these algorithms in various contexts. To allow for a fair comparison, the study is carried under optimal mixing conditions for each of these algorithms. The DR-A algorithm, whose proposal scheme introduces correlation in the pool of candidates, seems particularly promising. The algorithms are used in the contexts of Bayesian logistic regressions and classical inference for a linear regression model. This talk is based on work in collaboration with M. Mireuta, E. Moulines, and R. Douc.

  • Ana Best: Risk-set sampling, left truncation, and Bayesian methods in survival analysis

    Date: 2013-01-11

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Statisticians are often faced with budget concerns when conducting studies. The collection of some covariates, such as genetic data, is very expensive. Other covariates, such as detailed histories, might be difficult or time-consuming to measure. This helped bring about the invention of the nested case-control study, and its more generalized version, risk-set sampled survival analysis. The literature has a good discussion of the properties of risk-set sampling in standard right-censored survival data. My interest is in extending the methods of risk-set sampling to left-truncated survival data, which arise in prevalent longitudinal studies. Since prevalent studies are easier and cheaper to conduct than incident studies, this extension is extremely practical and relevant. I will introduce the partial likelihood in this scenario, and briefly discuss the asymptotic properties of my estimator. I will also introduce Bayesian methods for standard survival analysis, and discuss methods for analyzing risk-set-sampled survival data using Bayesian methods.

  • Sample size and power determination for multiple comparison procedures aiming at rejecting at least r among m false hypotheses

    Date: 2012-12-07

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Multiple testing problems arise in a variety of situations, notably in clinical trials with multiple endpoints. In such cases, it is often of interest to reject either all hypotheses or at least one of them. More generally, the question arises as to whether one can reject at least r out of m hypotheses. Statistical tools addressing this issue are rare in the literature. In this talk, I will recall well-known hypothesis testing concepts, both in a single- and in a multiple-hypothesis context. I will then present general power formulas for three important multiple comparison procedures: the Bonferroni and Hochberg procedures, as well as Holm’s sequential procedure. Next, I will describe an R package that we developed for sample size calculations in multiple endpoints trials where it is desired to reject at least r out of m hypotheses. This package covers the case where all the variables are continuous and four common variance-covariance patterns. I will show how to use this package to compute the sample size needed in a real-life application.

  • Sharing confidential datasets using differential privacy

    Date: 2012-11-30

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    While statistical agencies would like to share their data with researchers, they must also protect the confidentiality of the data provided by their respondents. To satisfy these two conflicting objectives, agencies use various techniques to restrict and modify the data before publication. Most of these techniques however share a common flaw: their confidentiality protection can not be rigorously measured. In this talk, I will present the criterion of differential privacy, a rigorous measure of the protection offered by such methods. Designed to guarantee confidentiality even in a worst-case scenario, differential privacy protects the information of any individual in the database against an adversary with complete knowledge of the rest of the dataset. I will first give a brief overview of recent and current research on the topic of differential privacy. I will then focus on the publication of differentially-private synthetic contingency tables and present some of my results on the methods for the generation and proper analysis of such datasets.

  • Copula-based regression estimation and Inference

    Date: 2012-11-16

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In this paper we investigate a new approach of estimating a regression function based on copulas. The main idea behind this approach is to write the regression function in terms of a copula and marginal distributions. Once the copula and the marginal distributions are estimated we use the plug-in method to construct the new estimator. Because various methods are available in the literature for estimating both a copula and a distribution, this idea provides a rich and flexible alternative to many existing regression estimators. We provide some asymptotic results related to this copula-based regression modeling when the copula is estimated via profile likelihood and the marginals are estimated nonparametrically. We also study the finite sample performance of the estimator and illustrate its usefulness by analyzing data from air pollution studies.

  • The multidimensional edge: Seeking hidden risks

    Date: 2012-11-09

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Assessing tail risks using the asymptotic models provided by multivariate extreme value theory has the danger that when asymptotic independence is present (as with the Gaussian copula model), the asymptotic model provides estimates of probabilities of joint tail regions that are zero. In diverse applications such as finance, telecommunications, insurance and environmental science, it may be difficult to believe in the absence of risk contagion. This problem can be partly ameliorated by using hidden regular variation which assumes a lower order asymptotic behavior on a subcone of the state space and this theory can be made more flexible by extensions in the following directions: (i) higher dimensions than two; (ii) where the lower order variation on a subcone is of extreme value type different from regular variation; and (iii) where the concept is extended to searching for lower order behavior on the complement of the support of the limit measure of regular variation. We discuss some challenges and potential applications to this ongoing effort.