/post/index.xml Past Seminar Series - McGill Statistics Seminars
  • A novel statistical framework to characterize antigen-specific T-cell functional diversity in single-cell expression data

    Date: 2015-02-27

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    I will talk about COMPASS, a new Bayesian hierarchical framework for characterizing functional differences in antigen-specific T cells by leveraging high-throughput, single-cell flow cytometry data. In particular, I will illustrate, using a variety of data sets, how COMPASS can reveal subtle and complex changes in antigen-specific T-cell activation profiles that correlate with biological endpoints. Applying COMPASS to data from the RV144 (“the Thai trial”) HIV clinical trial, it identified novel T-cell subsets that were inverse correlates of HIV infection risk. I also developed intuitive metrics for summarizing multivariate antigen-specific T-cell activation profiles for endpoints analysis. In addition, COMPASS identified correlates of latent infection in an immune study of Tuberculosis among South African adolescents. COMPASS is available as an R package and is sufficiently general that it can be adapted to new high-throughput data types, such as Mass Cytometry (CyTOF) and single-cell gene expressions, enabling interdisciplinary collaboration, which I will also highlight in my talk.

  • Comparison and assessment of particle diffusion models in biological fluids

    Date: 2015-02-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Rapidly progressing particle tracking techniques have revealed that foreign particles in biological fluids exhibit rich and at times unexpected behavior, with important consequences for disease diagnosis and drug delivery. Yet, there remains a frustrating lack of coherence in the description of these particles’ motion. Largely this is due to a reliance on functional statistics (e.g., mean-squared displacement) to perform model selection and assess goodness-of-fit. However, not only are such functional characteristics typically estimated with substantial variability, but also they may fail to distinguish between a number of stochastic processes — each making fundamentally different predictions for relevant quantities of scientific interest. In this talk, I will describe a detailed Bayesian analysis of leading candidate models for subdiffusive particle trajectories in human pulmonary mucus. Efficient and scalable computational strategies will be proposed. Model selection will be achieved by way of intrinsic Bayes factors, which avoid both non-informative priors and “using the data twice”. Goodness-of-fit will be evaluated via second-order criteria along with exact model residuals. Our findings suggest that a simple model of fractional Brownian motion describes the data just as well as a first-principles physical model of visco-elastic subdiffusion.

  • Tuning parameters in high-dimensional statistics

    Date: 2015-02-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    High-dimensional statistics is the basis for analyzing large and complex data sets that are generated by cutting-edge technologies in genetics, neuroscience, astronomy, and many other fields. However, Lasso, Ridge Regression, Graphical Lasso, and other standard methods in high-dimensional statistics depend on tuning parameters that are difficult to calibrate in practice. In this talk, I present two novel approaches to overcome this difficulty. My first approach is based on a novel testing scheme that is inspired by Lepski’s idea for bandwidth selection in non-parametric statistics. This approach provides tuning parameter calibration for estimation and prediction with the Lasso and other standard methods and is to date the only way to ensure high performance, fast computations, and optimal finite sample guarantees. My second approach is based on the minimization of an objective function that avoids tuning parameters altogether. This approach provides accurate variable selection in regression settings and, additionally, opens up new possibilities for the estimation of gene regulation networks, microbial ecosystems, and many other network structures.

  • A fast unified algorithm for solving group Lasso penalized learning problems

    Date: 2015-02-05

    Time: 15:30-16:30

    Location: BURN 1B39

    Abstract:

    We consider a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem. GMD allows for general design matrices, without requiring the predictors to be group-wise orthonormal. As illustration examples, we develop concrete algorithms for solving the group-lasso penalized least squares and several group-lasso penalized large margin classifiers. These group-lasso models have been implemented in an R package gglasso publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/gglasso. On simulated and real data, gglasso consistently outperforms the existing software for computing the group-lasso that implements either the classical groupwise descent algorithm or Nesterov’s method. An application in risk segmentation of insurance business is illustrated by analysis of an auto insurance claim dataset.

  • Joint analysis of multiple multi-state processes via copulas

    Date: 2015-02-02

    Time: 15:30-16:30

    Location: BURN 1214

    Abstract:

    A copula-based model is described which enables joint analysis of multiple progressive multi-state processes. Unlike intensity-based or frailty-based approaches to joint modeling, the copula formulation proposed herein ensures that a wide range of marginal multi-state processes can be specified and the joint model will retain these marginal features. The copula formulation also facilitates a variety of approaches to estimation and inference including composite likelihood and two-stage estimation procedures. We consider processes with Markov margins in detail, which are often suitable when chronic diseases are progressive in nature. We give special attention to the setting in which individuals are examined intermittently and transition times are consequently interval-censored. Simulation studies give empirical insight into the different methods of analysis and an application involving progression in joint damage in psoriatic arthritis provides further illustration.

  • Distributed estimation and inference for sparse regression

    Date: 2015-01-30

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We address two outstanding challenges in sparse regression: (i) computationally efficient estimation in distributed settings; (ii) valid inference for the selected coefficients. The main computational challenge in a distributed setting is harnessing the computational capabilities of all the machines while keeping communication costs low. We devise an approach that requires only a single round of communication among the machines. We show the approach recovers the convergence rate of the (centralized) lasso as long as each machine has access to an adequate number of samples. Turning to the second challenge, we devise an approach to post-selection inference by conditioning on the selected model. In a nutshell, our approach gives inferences with the same frequency interpretation as those given by data/sample splitting, but it is more broadly applicable and more powerful. The validity of our approach also does not depend on the correctness of the selected model, i.e., it gives valid inferences even when the selected model is incorrect.

  • Simultaneous white noise models and shrinkage recovery of functional data

    Date: 2015-01-16

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We consider the white noise representation of functional data taken as i.i.d. realizations of a Gaussian process. The main idea is to establish an asymptotic equivalence in Le Cam’s sense between an experiment which simultaneously describes these realizations and a collection of white noise models. In this context, we project onto an arbitrary basis and apply a novel variant of Stein-type estimation for optimal recovery of the realized trajectories. A key inequality is derived showing that the corresponding risks, conditioned on the underlying curves, are minimax optimal and can be made arbitrarily close to those that an oracle with knowledge of the process would attain. Empirical performance is illustrated through simulated and real data examples.

  • Functional data analysis and related topics

    Date: 2015-01-15

    Time: 16:00-17:00

    Location: CRM 1360 (U. de Montréal)

    Abstract:

    Functional data analysis (FDA) has received substantial attention, with applications arising from various disciplines, such as engineering, public health, finance etc. In general, the FDA approaches focus on nonparametric underlying models that assume the data are observed from realizations of stochastic processes satisfying some regularity conditions, e.g., smoothness constraints. The estimation and inference procedures usually do not depend on merely a finite number of parameters, which contrasts with parametric models, and exploit techniques, such as smoothing methods and dimension reduction, that allow data to speak for themselves. In this talk, I will give an overview of FDA methods and related topics developed in recent years.

  • Mixtures of coalesced generalized hyperbolic distributions

    Date: 2015-01-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A mixture of coalesced generalized hyperbolic distributions is developed by joining a finite mixture of generalized hyperbolic distributions with a mixture of multiple scaled generalized hyperbolic distributions. The result is a mixture of mixtures with shared model parameters and common mode. We begin by discussing the generalized hyperbolic distribution, which has the t, Gaussian and others as special cases. The generalized hyperbolic distribution can represented as a normal-variance mixture using a generalized inverse Gaussian distribution. This representation makes it a suitable candidate for the expectation-maximization algorithm. Secondly, we discuss the multiple scale generalized hyperbolic distribution which arises via implementation of a multi-dimensional weight function. A parameter estimation scheme is developed using the ever-expanding class of MM algorithms and the Bayesian information criterion is used for model selection. Special consideration is given to the contour shape. We use the coalesced distribution for clustering and compare them to finite mixtures of skew-t distributions using simulated and real data sets. Finally, the role of generalized hyperbolic mixtures within the wider model-based clustering, classification, and density estimation literature is discussed.

  • Space-time data analysis: Out of the Hilbert box

    Date: 2015-01-09

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Given the discouraging state of current efforts to curb global warming, we can imagine that we will soon turn our attention to mitigation. On a global scale, distressed populations will turn to national and international organizations for solutions to dramatic problems caused by climate change. These institutions in turn will mandate the collection of data on a scale and resolution that will present extraordinary statistical and computational challenges to those of us viewed as having the appropriate expertise. A review of the current state of our space-time data analysis machinery suggests that we have much to do. Most of current spatial modelling methodology is based on concepts translated from time series analysis, is heavily dependent on various kinds of stationarity assumptions, uses the Gaussian distribution to model data and depends on a priori coordinate systems that do not exist in nature. A way forward from this restrictive framework is proposed by modelling data over textured domains using layered coordinate systems.