/post/index.xml Past Seminar Series - McGill Statistics Seminars
  • Bayesian latent variable modelling of longitudinal family data for genetic pleiotropy studies

    Date: 2013-11-01

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters and we discuss some of the model misspecification effects. Central to the analysis is a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors.

  • XY - Basketball meets Big Data

    Date: 2013-10-25

    Time: 15:30-16:30

    Location: HEC Montréal Salle CIBC 1er étage

    Abstract:

    In this talk, I will explore the state of the art in the analysis and modeling of player tracking data in the NBA. In the past, player tracking data has been used primarily for visualization, such as understanding the spatial distribution of a player’s shooting characteristics, or to extract summary statistics, such as the distance traveled by a player in a given game. In this talk, I will present how we’re using advanced statistics and machine learning tools to answer previously unanswerable questions about the NBA. Examples include “How should teams configure their defensive matchups to minimize a player’s effectiveness?”, “Who are the best decision makers in the NBA?”, and “Who was responsible for the most points against in the NBA last season?”

  • Whole genome 3D architecture of chromatin and regulation

    Date: 2013-10-18

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distant enhancers and repressors. A recent molecular technique, 3C (chromosome conformation capture), that uses formaldehyde cross-linking and locus-specific PCR, was able to detect physical contacts between distant genomic loci. Such communication is achieved through spatial organization (looping) of chromosomes to bring genes and their regulatory elements into close proximity. Several adaptations of the 3C assay to study genomewide spatial interactions, including Hi-C and ChIA-PET, have been developed. The availability of such data makes it possible to reconstruct the underlying three-dimensional spatial chromatin structure. In this talk, I will first describe a Bayesian statistical model for building spatial estrogen receptor regulation focusing on reducing false positive interactions. A random effect model, PRAM, will then be presented to make inference on the locations of genomic loci in a 3D Euclidean space. Results from ChIA-PET and Hi-C data will be visualized to illustrate the regulation and spatial proximity of genomic loci that are far apart in their linear chromosomal locations.

  • Some recent developments in likelihood-based small area estimation

    Date: 2013-10-04

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Mixed models are commonly used for the analysis data in small area estimation. In particular, small area estimation has been extensively studied under linear mixed models. However, in practice there are many situations that we have counts or proportions in small area estimation; for example a (monthly) dataset on the number of incidences in small areas. Recently, small area estimation under the linear mixed model with penalized spline model, for xed part of the model, was studied. In this talk, small area estimation under generalized linear mixed models by combining time series and cross-sectional data with the extension of these models to include penalized spline regression models are proposed. A likelihood-based approach is used to predict small area parameters and also to provide prediction intervals. The performance of the proposed models and approach is evaluated through simulation studies and also by real datasets.

  • Measurement error and variable selection in parametric and nonparametric models

    Date: 2013-09-27

    Time: 15:30-16:30

    Location: RPHYS 114

    Abstract:

    This talk will start with a discussion of the relationships between LASSO estimation, ridge regression, and attenuation due to measurement error as motivation for, and introduction to, a new generalizable approach to variable selection in parametric and nonparametric regression and discriminant analysis. The approach transcends the boundaries of parametric/nonparametric models. It will first be described in the familiar context of linear regression where its relationship to the LASSO will be described in detail. The latter part of the talk will focus on implementation of the approach to nonparametric modeling where sparse dependence on covariates is desired. Applications to two- and multi-category classification problems will be discussed in detail.

  • Tests of independence for sparse contingency tables and beyond

    Date: 2013-09-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In this talk, a new and consistent statistic is proposed to test whether two discrete random variables are independent. The test is based on a statistic of the Cramér–von Mises type constructed from the so-called empirical checkerboard copula. The test can be used even for sparse contingency tables or tables whose dimension changes with the sample size. Because the limiting distribution of the test statistic is not tractable, a valid bootstrap procedure for the computation of p-values will be discussed. The new statistic is compared by a power study to standard procedures for testing independence, such as the Pearson’s Chi-Squared, the Likelihood Ratio, and the Zelterman statistics. The new test turns out to be considerably more powerful than all its competitors in all scenarios considered.

  • Bayesian nonparametric density estimation under length bias sampling

    Date: 2013-09-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A new density estimation method in a Bayesian nonparametric framework is presented when recorded data are not coming directly from the distribution of interest, but from a length biased version. From a Bayesian perspective, efforts to computationally evaluate posterior quantities conditionally on length biased data were hindered by the inability to circumvent the problem of a normalizing constant. In this talk a novel Bayesian nonparametric approach to the length bias sampling problem is presented which circumvents the issue of the normalizing constant. Numerical illustrations as well as a real data example are presented and the estimator is compared against its frequentist counterpart, the kernel density estimator for indirect data." This is joint work with: a) Spyridon J. Hatjispyros, University of the Aegean, Greece. b)Stephen G. Walker, University of Texas at Austin, U.S.A.

  • Arup Bose: Consistency of large dimensional sample covariance matrix under weak dependence

    Date: 2013-04-12

    Time: 14:30-15:30

    Location: Concordia

    Abstract:

    Estimation of large dimensional covariance matrix has been of interest recently. One model assumes that there are $p$ dimensional independent identically distributed Gaussian observations $X_1, \ldots , X_n$ with dispersion matrix $\Sigma_p$ and $p$ grows much faster than $n$. Appropriate convergence rate results have been established in the literature for tapered and banded estimators of $\Sigma_p$ which are based on the sample variance covariance matrix of $n$ observations.

  • Éric Marchand: On improved predictive density estimation with parametric constraints

    Date: 2013-04-05

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    We consider the problem of predictive density estimation under Kullback-Leibler loss when the parameter space is restricted to a convex subset. The principal situation analyzed relates to the estimation of an unknown predictive p-variate normal density based on an observation generated by another p-variate normal density. The means of the densities are assumed to coincide, the covariance matrices are a known multiple of the identity matrix. We obtain sharp results concerning plug-in estimators, we show that the best unrestricted invariant predictive density estimator is dominated by the Bayes estimator associated with a uniform prior on the restricted parameter space, and we obtain minimax results for cases where the parameter space is (i) a cone, and (ii) a ball. A key feature, which we will describe, is a correspondence between the predictive density estimation problem with a collection of point estimation problems. Finally, if time permits, we describe recent work concerning : (i) non-normal models, and (ii) analysis relative to other loss functions such as reverse Kullback-Leibler and integrated L2.

  • Hélène Massam: The hyper Dirichlet revisited: a characterization

    Date: 2013-03-22

    Time: 14:30-15:30

    Location: BURN 107

    Abstract:

    We give a characterization of the hyper Dirichlet distribution hyper Markov with respect to a decomposable graph $G$ (or equivalently a moral directed acyclic graph). For $X=(X_1,\ldots,X_d)$ following the hyper Dirichlet distribution, our characterization is through the so-called “local and global independence properties” for a carefully designed family of orders of the variables $X_1,\ldots,X_d$.

    The hyper Dirichlet for general directed acyclic graphs was derived from a characterization of the Dirichlet distribution given by Geiger and Heckerman (1997). This characterization of the Dirichlet for $X=(X_1,\ldots,X_d)$ is obtained through a functional equation derived from the local and global independence properties for two different orders of the variables. These two orders are seemingly chosen haphazardly but, as our results show, this is not so. Our results generalize those of Geiger and Heckerman (1997) and are given without the assumption of existence of a positive density for $X$.