/tags/2013-fall/index.xml 2013 Fall - McGill Statistics Seminars
  • Great probabilists publish posthumously

    Date: 2013-12-06

    Time: 15:30-16:30

    Location: UQAM Salle SH-3420

    Abstract:

    Jacob Bernoulli died in 1705. His great book Ars Conjectandi was published in 1713, 300 years ago. Thomas Bayes died in 1761. His great paper was read to the Royal Society of London in December 1763, 250 years ago, and published in 1764. These anniversaries are noted by discussing new evidence regarding the circumstances of publication, which in turn can lead to a better understanding of the works themselves. As to whether or not these examples of posthumous publication suggest a career move for any modern probabilist; that question is left to the audience.

  • Signal detection in high dimension: Testing sphericity against spiked alternatives

    Date: 2013-11-29

    Time: 15:30-16:30

    Location: Concordia MB-2.270

    Abstract:

    We consider the problem of testing the null hypothesis of sphericity for a high-dimensional covariance matrix against the alternative of a finite (unspecified) number of symmetry-breaking directions (multispiked alternatives) from the point of view of the asymptotic theory of statistical experiments. The region lying below the so-called phase transition or impossibility threshold is shown to be a contiguity region. Simple analytical expressions are derived for the asymptotic power envelope and the asymptotic powers of existing tests. These asymptotic powers are shown to lie very substantially below the power envelope; some of them even trivially coincide with the size of the test. In contrast, the asymptotic power of the likelihood ratio test is shown to be uniformly close to the same.

  • Tail order and its applications

    Date: 2013-11-22

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Tail order is a notion for quantifying the strength of dependence in the tail of a joint distribution. It can account for a wide range of dependence, ranging from tail positive dependence to tail negative dependence. We will introduce theory and applications of tail order. Conditions for tail orders of copula families will be discussed, and they are helpful in guiding us to find suitable copula families for statistical inference. As applications of tail order, regression analysis will be demonstrated, using appropriately constructed copulas, that can capture the unique tail dependence patterns appear in a medical expenditure panel survey data.

  • Submodel selection and post estimation: Making sense or folly

    Date: 2013-11-15

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In this talk, we consider estimation in generalized linear models when there are many potential predictors and some of them may not have influence on the response of interest. In the context of two competing models where one model includes all predictors and the other restricts variable coefficients to a candidate linear subspace based on subject matter or prior knowledge, we investigate the relative performances of Stein type shrinkage, pretest, and penalty estimators (L1GLM, adaptive L1GLM, and SCAD) with respect to the full model estimator. The asymptotic properties of the pretest and shrinkage estimators including the derivation of asymptotic distributional biases and risks are established. A Monte Carlo simulation study show that the mean squared error (MSE) of an adaptive shrinkage estimator is comparable to the MSE of the penalty estimators in many situations and in particular performs better than the penalty estimators when the model is sparse. A real data set analysis is also presented to compare the suggested methods.

  • The inadequacy of the summed score (and how you can fix it!)

    Date: 2013-11-08

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Health researchers often use patient and physician questionnaires to assess certain aspects of health status. Item Response Theory (IRT) provides a set of tools for examining the properties of the instrument and for estimation of the latent trait for each individual. In my research, I critically examine the usefulness of the summed score over items and an alternative weighted summed score (using weights computed from the IRT model) as an alternative to both the empirical Bayes estimator and maximum likelihood estimator for the Generalized Partial Credit Model. First, I will talk about two useful theoretical properties of the weighted summed score that I have proven as part of my work. Then I will relate the weighted summed score to other commonly used estimators of the latent trait. I will demonstrate the importance of these results in the context of both simulated and real data on the Center for Epidemiological Studies Depression Scale.

  • Bayesian latent variable modelling of longitudinal family data for genetic pleiotropy studies

    Date: 2013-11-01

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters and we discuss some of the model misspecification effects. Central to the analysis is a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors.

  • XY - Basketball meets Big Data

    Date: 2013-10-25

    Time: 15:30-16:30

    Location: HEC Montréal Salle CIBC 1er étage

    Abstract:

    In this talk, I will explore the state of the art in the analysis and modeling of player tracking data in the NBA. In the past, player tracking data has been used primarily for visualization, such as understanding the spatial distribution of a player’s shooting characteristics, or to extract summary statistics, such as the distance traveled by a player in a given game. In this talk, I will present how we’re using advanced statistics and machine learning tools to answer previously unanswerable questions about the NBA. Examples include “How should teams configure their defensive matchups to minimize a player’s effectiveness?”, “Who are the best decision makers in the NBA?”, and “Who was responsible for the most points against in the NBA last season?”

  • Whole genome 3D architecture of chromatin and regulation

    Date: 2013-10-18

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distant enhancers and repressors. A recent molecular technique, 3C (chromosome conformation capture), that uses formaldehyde cross-linking and locus-specific PCR, was able to detect physical contacts between distant genomic loci. Such communication is achieved through spatial organization (looping) of chromosomes to bring genes and their regulatory elements into close proximity. Several adaptations of the 3C assay to study genomewide spatial interactions, including Hi-C and ChIA-PET, have been developed. The availability of such data makes it possible to reconstruct the underlying three-dimensional spatial chromatin structure. In this talk, I will first describe a Bayesian statistical model for building spatial estrogen receptor regulation focusing on reducing false positive interactions. A random effect model, PRAM, will then be presented to make inference on the locations of genomic loci in a 3D Euclidean space. Results from ChIA-PET and Hi-C data will be visualized to illustrate the regulation and spatial proximity of genomic loci that are far apart in their linear chromosomal locations.

  • Some recent developments in likelihood-based small area estimation

    Date: 2013-10-04

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Mixed models are commonly used for the analysis data in small area estimation. In particular, small area estimation has been extensively studied under linear mixed models. However, in practice there are many situations that we have counts or proportions in small area estimation; for example a (monthly) dataset on the number of incidences in small areas. Recently, small area estimation under the linear mixed model with penalized spline model, for xed part of the model, was studied. In this talk, small area estimation under generalized linear mixed models by combining time series and cross-sectional data with the extension of these models to include penalized spline regression models are proposed. A likelihood-based approach is used to predict small area parameters and also to provide prediction intervals. The performance of the proposed models and approach is evaluated through simulation studies and also by real datasets.

  • Measurement error and variable selection in parametric and nonparametric models

    Date: 2013-09-27

    Time: 15:30-16:30

    Location: RPHYS 114

    Abstract:

    This talk will start with a discussion of the relationships between LASSO estimation, ridge regression, and attenuation due to measurement error as motivation for, and introduction to, a new generalizable approach to variable selection in parametric and nonparametric regression and discriminant analysis. The approach transcends the boundaries of parametric/nonparametric models. It will first be described in the familiar context of linear regression where its relationship to the LASSO will be described in detail. The latter part of the talk will focus on implementation of the approach to nonparametric modeling where sparse dependence on covariates is desired. Applications to two- and multi-category classification problems will be discussed in detail.