Date: 2011-09-09

Time: 14:00-16:30

Location: UdeM, Pav. André-Aisenstadt, SALLE 1360

Abstract:

Susko: The data generated by large scale sequencing projects is complex, high-dimensional, multivariate discrete data. In studies of evolutionary biology, the parameter space of evolutionary trees is an unusual additional complication from a statistical perspective. In this talk I will briefly introduce the general approaches to utilizing sequence data in phylogenetic inference. A particular issue of interest in phylogenetic inference is assessments of uncertainty about the true tree or structures that might be present in it. The primary way in which uncertainty is assessed in practice is through bootstrap support (BP) for splits, large values indicating strong support for the split. A difficulty with this measure, however, has been deciding how large is large enough. We discuss the interpretation of BP and ways of adjusting it so that it has an interpretation similar to a p-value. A related issue, having to do with the behaviour of methods when data are generated from a star tree, gives rise to an interesting example in which, due to the unusual statistical nature,Bayesian and maximum likelihood methods give strikingly different results, even asymptotically.

Labbe: Recently, expression quantitative loci (eQTL) mapping studies, where expression levels of thousands of genes are viewed as quantitative traits, have been used to provide greater insight into the biology of gene regulation. Current data analysis and interpretation of eQTL studies involve the use of multiple methods and applications, the output of which is often fragmented. In this talk, we present an integrated hierarchical Bayesian model that jointly models all genes and SNPs to detect eQTLs. We propose a model (named iBMQ) that is speci cally designed to handle a large number G of gene expressions, a large number S of regressors (genetic markers) and a small number n of individuals in what we call a “large G, large S, small n” paradigm. This method incorporates genotypic and gene expression data into a single model while 1) specifically coping with the high dimensionality of eQTL data (large number of genes), 2) borrowing strength from all gene expression data for the mapping procedures, and 3) controlling the number of false positives to a desirable level.

Speaker

Ed Susko, Dalhousie University

Aurélie Labbe, McGill University