/tags/2015-winter/index.xml 2015 Winter - McGill Statistics Seminars
  • Some new classes of bivariate distributions based on conditional specification

    Date: 2015-05-14

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A bivariate distribution can sometimes be characterized completely by properties of its conditional distributions. In this talk, we will discuss models of bivariate distributions whose conditionals are members of prescribed parametric families of distributions. Some relevant models with specified conditionals will be discussed, including the normal and lognormal cases, the skew-normal and other families of distributions. Finally, some conditionally specified densities will be shown to provide convenient flexible conjugate prior families in certain multiparameter Bayesian settings.

  • A statistical view of some recent climate controversies

    Date: 2015-05-07

    Time: 15:30-16:30

    Location: Université de Sherbrooke

    Abstract:

    This talk looks at some recent climate controversies from a statistical standpoint. The issues are motivated via changepoints and their detection. Changepoints are ubiquitous features in climatic time series, occurring whenever stations relocate or gauges are changed. Ignoring changepoints can produce spurious trend conclusions. Changepoint tests involving cumulative sums, likelihood ratio, and maximums of F-statistics are introduced; the asymptotic distributions of these statistics are quantified under the changepoint-free null hypothesis. The case of multiple changepoints is considered. The methods are used to study several controversies, including extreme temperature trends in the United States and Atlantic Basin tropical cyclone counts and strengths.

  • Testing for network community structure

    Date: 2015-03-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Networks provide a useful means to summarize sparse yet structured massive datasets, and so are an important aspect of the theory of big data. A key question in this setting is to test for the significance of community structure or what in social networks is termed homophily, the tendency of nodes to be connected based on similar characteristics. Network models where a single parameter per node governs the propensity of connection are popular in practice, because they are simple to understand and analyze. They frequently arise as null models to indicate a lack of community structure, since they cannot readily describe the division of a network into groups of nodes whose aggregate links behave in a block-like manner. Here we discuss asymptotic regimes under families of such models, and show their potential for enabling hypothesis tests in this setting. As an important special case, we treat network modularity, which summarizes the difference between observed and expected within-community edges under such null models, and which has seen much success in practical applications of large-scale network analysis. Our focus here is on statistical rather than algorithmic properties, however, in order to yield new insights into the canonical problem of testing for network community structure.

  • Bayesian approaches to causal inference: A lack-of-success story

    Date: 2015-03-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Despite almost universal acceptance across most fields of statistics, Bayesian inferential methods have yet to breakthrough to widespread use in causal inference, despite Bayesian arguments being a core component of early developments in the field. Some quasi-Bayesian procedures have been proposed, but often these approaches rely on heuristic, sometimes flawed, arguments. In this talk I will discuss some formulations of classical causal inference problems from the perspective of standard Bayesian representations, and propose some inferential solutions. This is joint work with Olli Saarela, Dalla Lana School of Public Health, University of Toronto, Erica Moodie, Department of Epidemiology, Biostatistics and Occupational Health, McGill University, and Marina Klein, Division of Infectious Diseases, Faculty of Medicine, McGill University.

  • A novel statistical framework to characterize antigen-specific T-cell functional diversity in single-cell expression data

    Date: 2015-02-27

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    I will talk about COMPASS, a new Bayesian hierarchical framework for characterizing functional differences in antigen-specific T cells by leveraging high-throughput, single-cell flow cytometry data. In particular, I will illustrate, using a variety of data sets, how COMPASS can reveal subtle and complex changes in antigen-specific T-cell activation profiles that correlate with biological endpoints. Applying COMPASS to data from the RV144 (“the Thai trial”) HIV clinical trial, it identified novel T-cell subsets that were inverse correlates of HIV infection risk. I also developed intuitive metrics for summarizing multivariate antigen-specific T-cell activation profiles for endpoints analysis. In addition, COMPASS identified correlates of latent infection in an immune study of Tuberculosis among South African adolescents. COMPASS is available as an R package and is sufficiently general that it can be adapted to new high-throughput data types, such as Mass Cytometry (CyTOF) and single-cell gene expressions, enabling interdisciplinary collaboration, which I will also highlight in my talk.

  • Comparison and assessment of particle diffusion models in biological fluids

    Date: 2015-02-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Rapidly progressing particle tracking techniques have revealed that foreign particles in biological fluids exhibit rich and at times unexpected behavior, with important consequences for disease diagnosis and drug delivery. Yet, there remains a frustrating lack of coherence in the description of these particles’ motion. Largely this is due to a reliance on functional statistics (e.g., mean-squared displacement) to perform model selection and assess goodness-of-fit. However, not only are such functional characteristics typically estimated with substantial variability, but also they may fail to distinguish between a number of stochastic processes — each making fundamentally different predictions for relevant quantities of scientific interest. In this talk, I will describe a detailed Bayesian analysis of leading candidate models for subdiffusive particle trajectories in human pulmonary mucus. Efficient and scalable computational strategies will be proposed. Model selection will be achieved by way of intrinsic Bayes factors, which avoid both non-informative priors and “using the data twice”. Goodness-of-fit will be evaluated via second-order criteria along with exact model residuals. Our findings suggest that a simple model of fractional Brownian motion describes the data just as well as a first-principles physical model of visco-elastic subdiffusion.

  • Tuning parameters in high-dimensional statistics

    Date: 2015-02-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    High-dimensional statistics is the basis for analyzing large and complex data sets that are generated by cutting-edge technologies in genetics, neuroscience, astronomy, and many other fields. However, Lasso, Ridge Regression, Graphical Lasso, and other standard methods in high-dimensional statistics depend on tuning parameters that are difficult to calibrate in practice. In this talk, I present two novel approaches to overcome this difficulty. My first approach is based on a novel testing scheme that is inspired by Lepski’s idea for bandwidth selection in non-parametric statistics. This approach provides tuning parameter calibration for estimation and prediction with the Lasso and other standard methods and is to date the only way to ensure high performance, fast computations, and optimal finite sample guarantees. My second approach is based on the minimization of an objective function that avoids tuning parameters altogether. This approach provides accurate variable selection in regression settings and, additionally, opens up new possibilities for the estimation of gene regulation networks, microbial ecosystems, and many other network structures.

  • A fast unified algorithm for solving group Lasso penalized learning problems

    Date: 2015-02-05

    Time: 15:30-16:30

    Location: BURN 1B39

    Abstract:

    We consider a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem. GMD allows for general design matrices, without requiring the predictors to be group-wise orthonormal. As illustration examples, we develop concrete algorithms for solving the group-lasso penalized least squares and several group-lasso penalized large margin classifiers. These group-lasso models have been implemented in an R package gglasso publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/gglasso. On simulated and real data, gglasso consistently outperforms the existing software for computing the group-lasso that implements either the classical groupwise descent algorithm or Nesterov’s method. An application in risk segmentation of insurance business is illustrated by analysis of an auto insurance claim dataset.

  • Joint analysis of multiple multi-state processes via copulas

    Date: 2015-02-02

    Time: 15:30-16:30

    Location: BURN 1214

    Abstract:

    A copula-based model is described which enables joint analysis of multiple progressive multi-state processes. Unlike intensity-based or frailty-based approaches to joint modeling, the copula formulation proposed herein ensures that a wide range of marginal multi-state processes can be specified and the joint model will retain these marginal features. The copula formulation also facilitates a variety of approaches to estimation and inference including composite likelihood and two-stage estimation procedures. We consider processes with Markov margins in detail, which are often suitable when chronic diseases are progressive in nature. We give special attention to the setting in which individuals are examined intermittently and transition times are consequently interval-censored. Simulation studies give empirical insight into the different methods of analysis and an application involving progression in joint damage in psoriatic arthritis provides further illustration.

  • Distributed estimation and inference for sparse regression

    Date: 2015-01-30

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We address two outstanding challenges in sparse regression: (i) computationally efficient estimation in distributed settings; (ii) valid inference for the selected coefficients. The main computational challenge in a distributed setting is harnessing the computational capabilities of all the machines while keeping communication costs low. We devise an approach that requires only a single round of communication among the machines. We show the approach recovers the convergence rate of the (centralized) lasso as long as each machine has access to an adequate number of samples. Turning to the second challenge, we devise an approach to post-selection inference by conditioning on the selected model. In a nutshell, our approach gives inferences with the same frequency interpretation as those given by data/sample splitting, but it is more broadly applicable and more powerful. The validity of our approach also does not depend on the correctness of the selected model, i.e., it gives valid inferences even when the selected model is incorrect.