/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Bayesian latent variable modelling of longitudinal family data for genetic pleiotropy studies

    Date: 2013-11-01

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters and we discuss some of the model misspecification effects. Central to the analysis is a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors.

  • Whole genome 3D architecture of chromatin and regulation

    Date: 2013-10-18

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distant enhancers and repressors. A recent molecular technique, 3C (chromosome conformation capture), that uses formaldehyde cross-linking and locus-specific PCR, was able to detect physical contacts between distant genomic loci. Such communication is achieved through spatial organization (looping) of chromosomes to bring genes and their regulatory elements into close proximity. Several adaptations of the 3C assay to study genomewide spatial interactions, including Hi-C and ChIA-PET, have been developed. The availability of such data makes it possible to reconstruct the underlying three-dimensional spatial chromatin structure. In this talk, I will first describe a Bayesian statistical model for building spatial estrogen receptor regulation focusing on reducing false positive interactions. A random effect model, PRAM, will then be presented to make inference on the locations of genomic loci in a 3D Euclidean space. Results from ChIA-PET and Hi-C data will be visualized to illustrate the regulation and spatial proximity of genomic loci that are far apart in their linear chromosomal locations.

  • Some recent developments in likelihood-based small area estimation

    Date: 2013-10-04

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Mixed models are commonly used for the analysis data in small area estimation. In particular, small area estimation has been extensively studied under linear mixed models. However, in practice there are many situations that we have counts or proportions in small area estimation; for example a (monthly) dataset on the number of incidences in small areas. Recently, small area estimation under the linear mixed model with penalized spline model, for xed part of the model, was studied. In this talk, small area estimation under generalized linear mixed models by combining time series and cross-sectional data with the extension of these models to include penalized spline regression models are proposed. A likelihood-based approach is used to predict small area parameters and also to provide prediction intervals. The performance of the proposed models and approach is evaluated through simulation studies and also by real datasets.

  • Tests of independence for sparse contingency tables and beyond

    Date: 2013-09-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In this talk, a new and consistent statistic is proposed to test whether two discrete random variables are independent. The test is based on a statistic of the Cramér–von Mises type constructed from the so-called empirical checkerboard copula. The test can be used even for sparse contingency tables or tables whose dimension changes with the sample size. Because the limiting distribution of the test statistic is not tractable, a valid bootstrap procedure for the computation of p-values will be discussed. The new statistic is compared by a power study to standard procedures for testing independence, such as the Pearson’s Chi-Squared, the Likelihood Ratio, and the Zelterman statistics. The new test turns out to be considerably more powerful than all its competitors in all scenarios considered.

  • Bayesian nonparametric density estimation under length bias sampling

    Date: 2013-09-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A new density estimation method in a Bayesian nonparametric framework is presented when recorded data are not coming directly from the distribution of interest, but from a length biased version. From a Bayesian perspective, efforts to computationally evaluate posterior quantities conditionally on length biased data were hindered by the inability to circumvent the problem of a normalizing constant. In this talk a novel Bayesian nonparametric approach to the length bias sampling problem is presented which circumvents the issue of the normalizing constant. Numerical illustrations as well as a real data example are presented and the estimator is compared against its frequentist counterpart, the kernel density estimator for indirect data." This is joint work with: a) Spyridon J. Hatjispyros, University of the Aegean, Greece. b)Stephen G. Walker, University of Texas at Austin, U.S.A.

  • Éric Marchand: On improved predictive density estimation with parametric constraints

    Date: 2013-04-05

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    We consider the problem of predictive density estimation under Kullback-Leibler loss when the parameter space is restricted to a convex subset. The principal situation analyzed relates to the estimation of an unknown predictive p-variate normal density based on an observation generated by another p-variate normal density. The means of the densities are assumed to coincide, the covariance matrices are a known multiple of the identity matrix. We obtain sharp results concerning plug-in estimators, we show that the best unrestricted invariant predictive density estimator is dominated by the Bayes estimator associated with a uniform prior on the restricted parameter space, and we obtain minimax results for cases where the parameter space is (i) a cone, and (ii) a ball. A key feature, which we will describe, is a correspondence between the predictive density estimation problem with a collection of point estimation problems. Finally, if time permits, we describe recent work concerning : (i) non-normal models, and (ii) analysis relative to other loss functions such as reverse Kullback-Leibler and integrated L2.

  • Jiahua Chen: Quantile and quantile function estimations under density ratio model

    Date: 2013-03-15

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Join work with Yukun Liu (East China Normal University)

    Population quantiles and their functions are important parameters in many applications. For example, the lower level quantiles often serve as crucial quality indices of forestry products and others. In the presence of several independent samples from populations satisfying density ratio model, we investigate the properties of the empirical likelihood (EL) based inferences of quantiles and their functions. In this paper, we first establish the consistency and asymptotic normality of the estimators of parameters and cumulative distributions. The induced EL quantile estimators are then shown to admit Bahadur representation. The results are used to construct asymptotically valid confidence intervals for functions of quantiles. In addition, we rigorously prove that the EL quantiles based on all samples are more efficient than the empirical quantiles which can only utilize information from individual samples. Simulation study shows that the EL quantiles and their functions have superior performances both when the density ratio model assumption is satisfied and mildly violated. An application example is used to demonstrate the new methods and potential cost savings.

  • Natalia Stepanova: On asymptotic efficiency of some nonparametric tests for testing multivariate independence

    Date: 2013-03-01

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Some problems of statistics can be reduced to extremal problems of minimizing functionals of smooth functions defined on the cube $[0,1]^m$, $m\geq 2$. In this talk, we consider a class of extremal problems that is closely connected to the problem of testing multivariate independence. By solving the extremal problem, we provide a unified approach to establishing weak convergence for a wide class of empirical processes which emerge in connection with testing multivariate independence. The use of our result will be also illustrated by describing the domain of local asymptotic optimality of some nonparametric tests of independence.

  • Eric Cormier: Data Driven Nonparametric Inference for Bivariate Extreme-Value Copulas

    Date: 2013-02-15

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    It is often crucial to know whether the dependence structure of a bivariate distribution belongs to the class of extreme-­‐value copulas. In this talk, I will describe a graphical tool that allows judgment regarding the existence of extreme-­‐value dependence. I will also present a data-­‐ driven nonparametric estimator of the Pickands dependence function. This estimator, which is constructed from constrained b-­‐splines, is intrinsic and differentiable, thereby enabling sampling from the fitted model. I will illustrate its properties via simulation. This will lead me to highlight some of the limitations associated with currently available tests of extremeness.

  • Celia Greenwood: Multiple testing and region-based tests of rare genetic variation

    Date: 2013-02-08

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In the context of univariate association tests between a trait of interest and common genetic variants (SNPs) across the whole genome, corrections for multiple testing have been well-studied. Due to the patterns of correlation (i.e. linkage disequilibrium), the number of independent tests remains close to 1 million, even when many more common genetic markers are available. With the advent of the DNA sequencing era, however, newly-identified genetic variants tend to be rare or even unique, and consequently single-variant tests of association have little power. As a result, region-based tests of association are being developed that examine associations between the trait and all the genetic variability in a small pre-defined region of the genome. However, coping with multiple testing in this situation has had little attention. I will discuss two aspects of multiple testing for region-based tests. First, I will describe a method for estimating the effective number of independent tests, and second, I will discuss an approach for controlling type I error that is based stratified false discovery rates, where strata are defined by external information such as genomic annotation.