/post/index.xml Past Seminar Series - McGill Statistics Seminars
  • Jiahua Chen: Quantile and quantile function estimations under density ratio model

    Date: 2013-03-15

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Join work with Yukun Liu (East China Normal University)

    Population quantiles and their functions are important parameters in many applications. For example, the lower level quantiles often serve as crucial quality indices of forestry products and others. In the presence of several independent samples from populations satisfying density ratio model, we investigate the properties of the empirical likelihood (EL) based inferences of quantiles and their functions. In this paper, we first establish the consistency and asymptotic normality of the estimators of parameters and cumulative distributions. The induced EL quantile estimators are then shown to admit Bahadur representation. The results are used to construct asymptotically valid confidence intervals for functions of quantiles. In addition, we rigorously prove that the EL quantiles based on all samples are more efficient than the empirical quantiles which can only utilize information from individual samples. Simulation study shows that the EL quantiles and their functions have superior performances both when the density ratio model assumption is satisfied and mildly violated. An application example is used to demonstrate the new methods and potential cost savings.

  • Natalia Stepanova: On asymptotic efficiency of some nonparametric tests for testing multivariate independence

    Date: 2013-03-01

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Some problems of statistics can be reduced to extremal problems of minimizing functionals of smooth functions defined on the cube $[0,1]^m$, $m\geq 2$. In this talk, we consider a class of extremal problems that is closely connected to the problem of testing multivariate independence. By solving the extremal problem, we provide a unified approach to establishing weak convergence for a wide class of empirical processes which emerge in connection with testing multivariate independence. The use of our result will be also illustrated by describing the domain of local asymptotic optimality of some nonparametric tests of independence.

  • Changbao Wu: Analysis of complex survey data with missing observations

    Date: 2013-02-22

    Time: 14:30-15:30

    Location: CRM, Université de Montréal, Pav. André-Ainsenstadt, salle 1360

    Abstract:

    In this talk, we first provide an overview of issues arising from and methods dealing with complex survey data in the presence of missing observations, with a major focus on the estimating equation approach for analysis and imputation methods for missing data. We then propose a semiparametric fractional imputation method for handling item nonresponses, assuming certain baseline auxiliary variables can be observed for all units in the sample. The proposed strategy combines the strengths of conventional single imputation and multiple imputation methods, and is easy to implement even with a large number of auxiliary variables available, which is typically the case for large scale complex surveys. Simulation results and some general discussion on related issues will also be presented.

  • Eric Cormier: Data Driven Nonparametric Inference for Bivariate Extreme-Value Copulas

    Date: 2013-02-15

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    It is often crucial to know whether the dependence structure of a bivariate distribution belongs to the class of extreme-­‐value copulas. In this talk, I will describe a graphical tool that allows judgment regarding the existence of extreme-­‐value dependence. I will also present a data-­‐ driven nonparametric estimator of the Pickands dependence function. This estimator, which is constructed from constrained b-­‐splines, is intrinsic and differentiable, thereby enabling sampling from the fitted model. I will illustrate its properties via simulation. This will lead me to highlight some of the limitations associated with currently available tests of extremeness.

  • Celia Greenwood: Multiple testing and region-based tests of rare genetic variation

    Date: 2013-02-08

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In the context of univariate association tests between a trait of interest and common genetic variants (SNPs) across the whole genome, corrections for multiple testing have been well-studied. Due to the patterns of correlation (i.e. linkage disequilibrium), the number of independent tests remains close to 1 million, even when many more common genetic markers are available. With the advent of the DNA sequencing era, however, newly-identified genetic variants tend to be rare or even unique, and consequently single-variant tests of association have little power. As a result, region-based tests of association are being developed that examine associations between the trait and all the genetic variability in a small pre-defined region of the genome. However, coping with multiple testing in this situation has had little attention. I will discuss two aspects of multiple testing for region-based tests. First, I will describe a method for estimating the effective number of independent tests, and second, I will discuss an approach for controlling type I error that is based stratified false discovery rates, where strata are defined by external information such as genomic annotation.

  • Daniela Witten: Structured learning of multiple Gaussian graphical models

    Date: 2013-02-01

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    I will consider the task of estimating high-dimensional Gaussian graphical models (or networks) corresponding to a single set of features under several distinct conditions. In other words, I wish to estimate several distinct but related networks. I assume that most aspects of the networks are shared, but that there are some structured differences between them. The goal is to exploit the similarity among the networks in order to obtain more accurate estimates of each individual network, as well as to identify the differences between the networks.

  • Mylène Bédard: On the empirical efficiency of local MCMC algorithms with pools of proposals

    Date: 2013-01-25

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In an attempt to improve on the Metropolis algorithm, various MCMC methods with auxiliary variables, such as the multiple-try and delayed rejection Metropolis algorithms, have been proposed. These methods generate several candidates in a single iteration; accordingly they are computationally more intensive than the Metropolis algorithm. It is usually difficult to provide a general estimate for the computational cost of a method without being overly conservative; potentially efficient methods could thus be overlooked by relying on such estimates. In this talk, we describe three algorithms with auxiliary variables - the multiple-try Metropolis (MTM) algorithm, the multiple-try Metropolis hit-and-run (MTM-HR) algorithm, and the delayed rejection Metropolis algorithm with antithetic proposals (DR-A) - and investigate the net performance of these algorithms in various contexts. To allow for a fair comparison, the study is carried under optimal mixing conditions for each of these algorithms. The DR-A algorithm, whose proposal scheme introduces correlation in the pool of candidates, seems particularly promising. The algorithms are used in the contexts of Bayesian logistic regressions and classical inference for a linear regression model. This talk is based on work in collaboration with M. Mireuta, E. Moulines, and R. Douc.

  • Victor Chernozhukov: Inference on treatment effects after selection amongst high-dimensional controls

    Date: 2013-01-18

    Time: 14:30-15:30

    Location: BURN 306

    Abstract:

    We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly non-Gaussian and heteroscedastic disturbances. Our analysis allows the number of controls to be much larger than the sample size. To make informative inference feasible, we require the model to be approximately sparse; that is, we require that the effect of confounding factors can be controlled for up to a small approximation error by conditioning on a relatively small number of controls whose identities are unknown. The latter condition makes it possible to estimate the treatment effect by selecting approximately the right set of controls. We develop a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the “post-double-selection” method. Our results apply to Lasso-type methods used for covariate selection as well as to any other model selection method that is able to find a sparse model with good approximation properties.

  • Ana Best: Risk-set sampling, left truncation, and Bayesian methods in survival analysis

    Date: 2013-01-11

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Statisticians are often faced with budget concerns when conducting studies. The collection of some covariates, such as genetic data, is very expensive. Other covariates, such as detailed histories, might be difficult or time-consuming to measure. This helped bring about the invention of the nested case-control study, and its more generalized version, risk-set sampled survival analysis. The literature has a good discussion of the properties of risk-set sampling in standard right-censored survival data. My interest is in extending the methods of risk-set sampling to left-truncated survival data, which arise in prevalent longitudinal studies. Since prevalent studies are easier and cheaper to conduct than incident studies, this extension is extremely practical and relevant. I will introduce the partial likelihood in this scenario, and briefly discuss the asymptotic properties of my estimator. I will also introduce Bayesian methods for standard survival analysis, and discuss methods for analyzing risk-set-sampled survival data using Bayesian methods.

  • What percentage of children in the U.S. are eating a healthy diet? A statistical approach

    Date: 2012-12-14

    Time: 14:30-15:30

    Location: Concordia, Room LB 921-04

    Abstract:

    In the United States the preferred method of obtaining dietary intake data is the 24-hour dietary recall, yet the measure of most interest is usual or long-term average daily intake, which is impossible to measure. Thus, usual dietary intake is assessed with considerable measurement error. Also, diet represents numerous foods, nutrients and other components, each of which have distinctive attributes. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns and their effect on various diseases. Consumption of these components varies widely: some are consumed daily by almost everyone on every day, while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, they are often correlated with each other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level.