/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Tail order and its applications

    Date: 2013-11-22

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Tail order is a notion for quantifying the strength of dependence in the tail of a joint distribution. It can account for a wide range of dependence, ranging from tail positive dependence to tail negative dependence. We will introduce theory and applications of tail order. Conditions for tail orders of copula families will be discussed, and they are helpful in guiding us to find suitable copula families for statistical inference. As applications of tail order, regression analysis will be demonstrated, using appropriately constructed copulas, that can capture the unique tail dependence patterns appear in a medical expenditure panel survey data.

  • Submodel selection and post estimation: Making sense or folly

    Date: 2013-11-15

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In this talk, we consider estimation in generalized linear models when there are many potential predictors and some of them may not have influence on the response of interest. In the context of two competing models where one model includes all predictors and the other restricts variable coefficients to a candidate linear subspace based on subject matter or prior knowledge, we investigate the relative performances of Stein type shrinkage, pretest, and penalty estimators (L1GLM, adaptive L1GLM, and SCAD) with respect to the full model estimator. The asymptotic properties of the pretest and shrinkage estimators including the derivation of asymptotic distributional biases and risks are established. A Monte Carlo simulation study show that the mean squared error (MSE) of an adaptive shrinkage estimator is comparable to the MSE of the penalty estimators in many situations and in particular performs better than the penalty estimators when the model is sparse. A real data set analysis is also presented to compare the suggested methods.

  • The inadequacy of the summed score (and how you can fix it!)

    Date: 2013-11-08

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Health researchers often use patient and physician questionnaires to assess certain aspects of health status. Item Response Theory (IRT) provides a set of tools for examining the properties of the instrument and for estimation of the latent trait for each individual. In my research, I critically examine the usefulness of the summed score over items and an alternative weighted summed score (using weights computed from the IRT model) as an alternative to both the empirical Bayes estimator and maximum likelihood estimator for the Generalized Partial Credit Model. First, I will talk about two useful theoretical properties of the weighted summed score that I have proven as part of my work. Then I will relate the weighted summed score to other commonly used estimators of the latent trait. I will demonstrate the importance of these results in the context of both simulated and real data on the Center for Epidemiological Studies Depression Scale.

  • Bayesian latent variable modelling of longitudinal family data for genetic pleiotropy studies

    Date: 2013-11-01

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters and we discuss some of the model misspecification effects. Central to the analysis is a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors.

  • Whole genome 3D architecture of chromatin and regulation

    Date: 2013-10-18

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distant enhancers and repressors. A recent molecular technique, 3C (chromosome conformation capture), that uses formaldehyde cross-linking and locus-specific PCR, was able to detect physical contacts between distant genomic loci. Such communication is achieved through spatial organization (looping) of chromosomes to bring genes and their regulatory elements into close proximity. Several adaptations of the 3C assay to study genomewide spatial interactions, including Hi-C and ChIA-PET, have been developed. The availability of such data makes it possible to reconstruct the underlying three-dimensional spatial chromatin structure. In this talk, I will first describe a Bayesian statistical model for building spatial estrogen receptor regulation focusing on reducing false positive interactions. A random effect model, PRAM, will then be presented to make inference on the locations of genomic loci in a 3D Euclidean space. Results from ChIA-PET and Hi-C data will be visualized to illustrate the regulation and spatial proximity of genomic loci that are far apart in their linear chromosomal locations.

  • Some recent developments in likelihood-based small area estimation

    Date: 2013-10-04

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Mixed models are commonly used for the analysis data in small area estimation. In particular, small area estimation has been extensively studied under linear mixed models. However, in practice there are many situations that we have counts or proportions in small area estimation; for example a (monthly) dataset on the number of incidences in small areas. Recently, small area estimation under the linear mixed model with penalized spline model, for xed part of the model, was studied. In this talk, small area estimation under generalized linear mixed models by combining time series and cross-sectional data with the extension of these models to include penalized spline regression models are proposed. A likelihood-based approach is used to predict small area parameters and also to provide prediction intervals. The performance of the proposed models and approach is evaluated through simulation studies and also by real datasets.

  • Tests of independence for sparse contingency tables and beyond

    Date: 2013-09-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In this talk, a new and consistent statistic is proposed to test whether two discrete random variables are independent. The test is based on a statistic of the Cramér–von Mises type constructed from the so-called empirical checkerboard copula. The test can be used even for sparse contingency tables or tables whose dimension changes with the sample size. Because the limiting distribution of the test statistic is not tractable, a valid bootstrap procedure for the computation of p-values will be discussed. The new statistic is compared by a power study to standard procedures for testing independence, such as the Pearson’s Chi-Squared, the Likelihood Ratio, and the Zelterman statistics. The new test turns out to be considerably more powerful than all its competitors in all scenarios considered.

  • Bayesian nonparametric density estimation under length bias sampling

    Date: 2013-09-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A new density estimation method in a Bayesian nonparametric framework is presented when recorded data are not coming directly from the distribution of interest, but from a length biased version. From a Bayesian perspective, efforts to computationally evaluate posterior quantities conditionally on length biased data were hindered by the inability to circumvent the problem of a normalizing constant. In this talk a novel Bayesian nonparametric approach to the length bias sampling problem is presented which circumvents the issue of the normalizing constant. Numerical illustrations as well as a real data example are presented and the estimator is compared against its frequentist counterpart, the kernel density estimator for indirect data." This is joint work with: a) Spyridon J. Hatjispyros, University of the Aegean, Greece. b)Stephen G. Walker, University of Texas at Austin, U.S.A.

  • Éric Marchand: On improved predictive density estimation with parametric constraints

    Date: 2013-04-05

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    We consider the problem of predictive density estimation under Kullback-Leibler loss when the parameter space is restricted to a convex subset. The principal situation analyzed relates to the estimation of an unknown predictive p-variate normal density based on an observation generated by another p-variate normal density. The means of the densities are assumed to coincide, the covariance matrices are a known multiple of the identity matrix. We obtain sharp results concerning plug-in estimators, we show that the best unrestricted invariant predictive density estimator is dominated by the Bayes estimator associated with a uniform prior on the restricted parameter space, and we obtain minimax results for cases where the parameter space is (i) a cone, and (ii) a ball. A key feature, which we will describe, is a correspondence between the predictive density estimation problem with a collection of point estimation problems. Finally, if time permits, we describe recent work concerning : (i) non-normal models, and (ii) analysis relative to other loss functions such as reverse Kullback-Leibler and integrated L2.

  • Jiahua Chen: Quantile and quantile function estimations under density ratio model

    Date: 2013-03-15

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Join work with Yukun Liu (East China Normal University)

    Population quantiles and their functions are important parameters in many applications. For example, the lower level quantiles often serve as crucial quality indices of forestry products and others. In the presence of several independent samples from populations satisfying density ratio model, we investigate the properties of the empirical likelihood (EL) based inferences of quantiles and their functions. In this paper, we first establish the consistency and asymptotic normality of the estimators of parameters and cumulative distributions. The induced EL quantile estimators are then shown to admit Bahadur representation. The results are used to construct asymptotically valid confidence intervals for functions of quantiles. In addition, we rigorously prove that the EL quantiles based on all samples are more efficient than the empirical quantiles which can only utilize information from individual samples. Simulation study shows that the EL quantiles and their functions have superior performances both when the density ratio model assumption is satisfied and mildly violated. An application example is used to demonstrate the new methods and potential cost savings.