/categories/crm-colloquium/index.xml CRM-Colloquium - McGill Statistics Seminars
  • What percentage of children in the U.S. are eating a healthy diet? A statistical approach

    Date: 2012-12-14

    Time: 14:30-15:30

    Location: Concordia, Room LB 921-04

    Abstract:

    In the United States the preferred method of obtaining dietary intake data is the 24-hour dietary recall, yet the measure of most interest is usual or long-term average daily intake, which is impossible to measure. Thus, usual dietary intake is assessed with considerable measurement error. Also, diet represents numerous foods, nutrients and other components, each of which have distinctive attributes. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns and their effect on various diseases. Consumption of these components varies widely: some are consumed daily by almost everyone on every day, while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, they are often correlated with each other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level.

  • A nonparametric Bayesian model for local clustering

    Date: 2012-11-23

    Time: 14:30-15:30

    Location: BURN 107

    Abstract:

    We propose a nonparametric Bayesian local clustering (NoB-LoC) approach for heterogeneous data. Using genomics data as an example, the NoB-LoC clusters genes into gene sets and simultaneously creates multiple partitions of samples, one for each gene set. In other words, the sample partitions are nested within the gene sets. Inference is guided by a joint probability model on all random elements. Biologically, the model formalizes the notion that biological samples cluster differently with respect to different genetic processes, and that each process is related to only a small subset of genes. These local features are importantly different from global clustering approaches such as hierarchical clustering, which create one partition of samples that applies for all genes in the data set. Furthermore, the NoB-LoC includes a special cluster of genes that do not give rise to any meaningful partition of samples. These genes could be irrelevant to the disease conditions under investigation. Similarly, for a given gene set, the NoB-LoC includes a subset of samples that do not co-cluster with other samples. The samples in this special cluster could, for example, be those whose disease subtype is not characterized by the particular gene set.

  • Observational studies in healthcare: are they any good?

    Date: 2012-10-19

    Time: 14:30-15:30

    Location: UdeM

    Abstract:

    Observational healthcare data, such as administrative claims and electronic health records, play an increasingly prominent role in healthcare. Pharmacoepidemiologic studies in particular routinely estimate temporal associations between medical product exposure and subsequent health outcomes of interest, and such studies influence prescribing patterns and healthcare policy more generally. Some authors have questioned the reliability and accuracy of such studies, but few previous efforts have attempted to measure their performance.

  • Regularized semiparametric functional linear regression

    Date: 2012-09-21

    Time: 14:30-15:30

    Location: McGill, Burnside Hall 1214

    Abstract:

    In many scientific experiments we need to face analysis with functional data, where the observations are sampled from random process, together with a potentially large number of non-functional covariates. The complex nature of functional data makes it difficult to directly apply existing methods to model selection and estimation. We propose and study a new class of penalized semiparametric functional linear regression to characterize the regression relation between a scalar response and multiple covariates, including both functional covariates and scalar covariates. The resulting method provides a unified and flexible framework to jointly model functional and non-functional predictors, identify important covariates, and improve efficiency and interpretability of the estimates. Featured with two types of regularization: the shrinkage on the effects of scalar covariates and the truncation on principal components of the functional predictor, the new approach is flexible and effective in dimension reduction. One key contribution of this paper is to study theoretical properties of the regularized semiparametric functional linear model. We establish oracle and consistency properties under mild conditions by allowing possibly diverging number of scalar covariates and simultaneously taking the infinite-dimensional functional predictor into account. We illustrate the new estimator with extensive simulation studies, and then apply it to an image data analysis.

  • Li: High-dimensional feature selection using hierarchical Bayesian logistic regression with heavy-tailed priors | Rao: Best predictive estimation for linear mixed models with applications to small area estimation

    Date: 2012-04-13

    Time: 14:00-16:30

    Location: MAASS 217

    Abstract:

    Li: The problem of selecting the most useful features from a great many (eg, thousands) of candidates arises in many areas of modern sciences. An interesting problem from genomic research is that, from thousands of genes that are active (expressed) in certain tissue cells, we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal). In this paper, we report a Bayesian logistic regression method based on heavytailed priors with moderately small degree freedom (such as 1) and small scale (such as 0.01), and using Gibbs sampling to do the computation. We show that it can distinctively separate a couple of useful features from a large number of useless ones, and discriminate many redundant correlated features. We also show that this method is very stable to the choice of scale. We apply our method to a microarray data set related to prostate cancer, and identify only 3 genes out of 6033 candidates that can separate cancer and normal tissues very well in leave-one-out cross-validation.

  • Using tests of homoscedasticity to test missing completely at random | Hugh Chipman: Sequential optimization of a computer model and other Active Learning problems

    Date: 2012-03-09

    Time: 14:00-16:30

    Location: UQAM, 201 ave. du Président-Kennedy, salle 5115

    Abstract:

    Li: The problem of selecting the most useful features from a great many (eg, thousands) of candidates arises in many areas of modern sciences. An interesting problem from genomic research is that, from thousands of genes that are active (expressed) in certain tissue cells, we want to find the genes that can be used to separate tissues of different classes (eg. cancer and normal). In this paper, we report a Bayesian logistic regression method based on heavytailed priors with moderately small degree freedom (such as 1) and small scale (such as 0.01), and using Gibbs sampling to do the computation. We show that it can distinctively separate a couple of useful features from a large number of useless ones, and discriminate many redundant correlated features. We also show that this method is very stable to the choice of scale. We apply our method to a microarray data set related to prostate cancer, and identify only 3 genes out of 6033 candidates that can separate cancer and normal tissues very well in leave-one-out cross-validation.

  • Stute: Principal component analysis of the Poisson Process | Blath: Longterm properties of the symbiotic branching model

    Date: 2012-02-10

    Time: 14:00-16:30

    Location: Concordia

    Abstract:

    Stute: The Poisson Process constitutes a well-known model for describing random events over time. It has many applications in marketing research, insurance mathematics and finance. Though it has been studied for decades not much is known how to check (in a non-asymptotic way) the validity of the Poisson Process. In this talk we present the principal component decomposition of the Poisson Process which enables us to derive finite sample properties of associated goodness-of-fit tests. In the first step we show that the Fourier-transforms of the components contain Bessel and Struve functions. Inversion leads to densities which are modified arc sin distributions.

  • Bayesian approaches to evidence synthesis in clinical practice guideline development

    Date: 2012-01-13

    Time: 15:30-16:30

    Location: Concordia, Library Building LB-921.04

    Abstract:

    The American College of Cardiology Foundation (ACCF) and the American Heart Association (AHA) have jointly engaged in the production of guideline in the area of cardiovascular disease since 1980. The developed guidelines are intended to assist health care providers in clinical decision making by describing a range of generally acceptable approaches for the diagnosis, management, or prevention of specific diseases or conditions. This talk describes some of our work under a contract with ACCF/AHA for applying Bayesian methods to guideline recommendation development. In a demonstration example, we use Bayesian meta-analysis strategies to summarize evidence on the comparative effectiveness between Percutaneous coronary intervention and Coronary artery bypass grafting for patients with unprotected left main coronary artery disease. We show the usefulness and flexibility of Bayesian methods in handling data arisen from studies with different designs (e.g. RCTs and observational studies), performing indirect comparison among treatments when studies with direct comparisons are unavailable, and accounting for historical data.

  • Detecting evolution in experimental ecology: Diagnostics for missing state variables

    Date: 2011-12-09

    Time: 15:30-16:30

    Location: UQAM Salle 5115

    Abstract:

    This talk considers goodness of fit diagnostics for time-series data from processes approximately modeled by systems of nonlinear ordinary differential equations. In particular, we seek to determine three nested causes of lack of fit: (i) unmodeled stochastic forcing, (ii) mis-specified functional forms and (iii) mis-specified state variables. Testing lack of fit in differential equations is challenging since the model is expressed in terms of rates of change of the measured variables. Here, lack of fit is represented on the model scale via time-varying parameters. We develop tests for each of the three cases above through bootstrap and permutation methods.

  • Guérin: An ergodic variant of the telegraph process for a toy model of bacterial chemotaxis | Staicu: Skewed functional processes and their applications

    Date: 2011-11-11

    Time: 14:00-16:30

    Location: UdeM

    Abstract:

    Guérin: I will study the long time behavior of a variant of the classic telegraph process, with non-constant jump rates that induce a drift towards the origin. This process can be seen as a toy model for velocity-jump processes recently proposed as mathematical models of bacterial chemotaxis. I will give its invariant law and construct an explicit coupling for velocity and position, providing exponential ergodicity with moreover a quantitative control of the total variation distance to equilibrium at each time instant. It is a joint work with Joaquin Fontbona (Universidad de Santiago, Chile) and Florent Malrieu (Université Rennes 1, France).