/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Estimating a variance-covariance surface for functional and longitudinal data

    Date: 2012-03-02

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In functional data analysis, as in its multivariate counterpart, estimates of the bivariate covariance kernel σ(s,t ) and its inverse are useful for many things, and we need the inverse of a covariance matrix or kernel especially often. However, the dimensionality of functional observations often exceeds the sample size available to estimate σ(s,t, and then the analogue S of the multivariate sample estimate is singular and non-invertible. Even when this is not the case, the high dimensionality S often implies unacceptable sample variability and loss of degrees of freedom for model fitting. The common practice of employing low-dimensional principal component approximations to σ(s,t) to achieve invertibility also raises serious issues.

  • McGillivray: A penalized quasi-likelihood approach for estimating the number of states in a hidden Markov model | Best: Risk-set sampling and left truncation in survival analysis

    Date: 2012-02-17

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    McGillivray: In statistical applications of hidden Markov models (HMMs), one may have no knowledge of the number of hidden states (or order) of the model needed to be able to accurately represent the underlying process of the data. The problem of estimating the number of hidden states of the HMM is thus brought to the forefront. In this talk, we present a penalized quasi-likelihood approach for order estimation in HMMs which makes use of the fact that the marginal distribution of the observations from a HMM is a finite mixture model. The method starts with a HMM with a large number of states and obtains a model of lower order by clustering and combining similar states of the model through two penalty functions. We assess the performance of the new method via extensive simulation studies for Normal and Poisson HMMs.

  • Du: Simultaneous fixed and random effects selection in finite mixtures of linear mixed-effects models | Harel: Measuring fatigue in systemic sclerosis: a comparison of the SF-36 vitality subscale and FACIT fatigue scale using item response theory

    Date: 2012-02-03

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Du: Linear mixed-effects (LME) models are frequently used for modeling longitudinal data. One complicating factor in the analysis of such data is that samples are sometimes obtained from a population with significant underlying heterogeneity, which would be hard to capture by a single LME model. Such problems may be addressed by a finite mixture of linear mixed-effects (FMLME) models, which segments the population into subpopulations and models each subpopulation by a distinct LME model. Often in the initial stage of a study, a large number of predictors are introduced. However, their associations to the response variable vary from one component to another of the FMLME model. To enhance predictability and to obtain a parsimonious model, it is of great practical interest to identify the important effects, both fixed and random, in the model. Traditional variable selection techniques such as stepwise deletion and subset selection are computationally expensive as the number of covariates and components in the mixture model increases. In this talk, we introduce a penalized likelihood approach and propose a nested EM algorithm for efficient numerical computations. Our estimators are shown to possess desirable properties such as consistency, sparsity and asymptotic normality. We illustrate the performance of our method through simulations and a systemic sclerosis data example.

  • Applying Kalman filtering to problems in causal inference

    Date: 2012-01-27

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A common problem in observational studies is estimating the causal effect of time-varying treatment in the presence of a time varying confounder. When random assignment of subjects to comparison groups is not possible, time-varying confounders can cause bias in estimating causal effects even after standard regression adjustment if past treatment history is a predictor of future confounders. To eliminate the bias of standard methods for estimating the causal effect of time varying treatment, Robins developed a number of innovative methods for discrete treatment levels, including G-computation, G-estimation, and marginal structural models (MSMs). However, there does not currently exist straight-forward applications of G-Estimation and MSMs for continuous treatment. In this talk, I will introduce an alternative approach to previous methods which utilize the Kalman filter. The key advantage to the Kalman filter approach is that the model easily accommodates continuous levels of treatment.

  • A concave regularization technique for sparse mixture models

    Date: 2012-01-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Latent variable mixture models are a powerful tool for exploring the structure in large datasets. A common challenge for interpreting such models is a desire to impose sparsity, the natural assumption that each data point only contains few latent features. Since mixture distributions are constrained in their L1 norm, typical sparsity techniques based on L1 regularization become toothless, and concave regularization becomes necessary. Unfortunately concave regularization typically results in EM algorithms that must perform problematic non-convex M-step optimization. In this work, we introduce a technique for circumventing this difficulty, using the so-called Mountain Pass Theorem to provide easily verifiable conditions under which the M-step is well-behaved despite the lacking convexity. We also develop a correspondence between logarithmic regularization and what we term the pseudo-Dirichlet distribution, a generalization of the ordinary Dirichlet distribution well-suited for inducing sparsity. We demonstrate our approach on a text corpus, inferring a sparse topic mixture model for 2,406 weblogs.

  • Path-dependent estimation of a distribution under generalized censoring

    Date: 2011-12-02

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    This talk focuses on the problem of the estimation of a distribution on an arbitrary complete separable metric space when the data points are subject to censoring by a general class of random sets. A path-dependent estimator for the distribution is proposed; among other properties, the estimator is sequential in the sense that it only uses data preceding any fixed point at which it is evaluated. If the censoring mechanism is totally ordered, the paths may be chosen in such a way that the estimate of the distribution defines a measure. In this case, we can prove a functional central limit theorem for the estimator when the underlying space is Euclidean. This is joint work with Gail Ivanoff (University of Ottawa)

  • Estimation of the risk of a collision when using a cell phone while driving

    Date: 2011-11-25

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    The use of cell phone while driving raises the question of whether it is associated with an increased collision risk and if so, what is its magnitude. For policy decision making, it is important to rely on an accurate estimate of the real crash risk of cell phone use while driving. Three important epidemiological studies were published on the subject, two using the case-crossover approach and one using a more conventional longitudinal cohort design. The methodology and results of these studies will be presented and discussed.

  • Construction of bivariate distributions via principal components

    Date: 2011-11-18

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    The diagonal expansion of a bivariate distribution (Lancaster, 1958) has been used as a tool to construct bivariate distributions; this method has been generalized using principal dimensions of random variables (Cuadras 2002). Sufficient and necessary conditions are given for uniform, exponential, logistic and Pareto marginals in the one and two-dimensional case. The corresponding copulas are obtained.

    Speaker

    Amparo Casanova is an Assistant Professor at the Dalla Lana School of Public Health, Division of Biostatistics, University of Toronto.

  • A Bayesian method of parametric inference for diffusion processes

    Date: 2011-11-04

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Diffusion processes have been used to model a multitude of continuous-time phenomena in Engineering and the Natural Sciences, and as in this case, the volatility of financial assets. However, parametric inference has long been complicated by an intractable likelihood function. For many models the most effective solution involves a large amount of missing data for which the typical Gibbs sampler can be arbitrarily slow. On the other hand, joint parameter and missing data proposals can lead to a radical improvement, but their acceptance rate tends to scale exponentially with the number of observations.

  • Maximum likelihood estimation in network models

    Date: 2011-11-03

    Time: 16:00-17:00

    Location: BURN 1205

    Abstract:

    This talk is concerned with maximum likelihood estimation (MLE) in exponential statistical models for networks (random graphs) and, in particular, with the beta model, a simple model for undirected graphs in which the degree sequence is the minimal sufficient statistic. The speaker will present necessary and sufficient conditions for the existence of the MLE of the beta model parameters that are based on a geometric object known as the polytope of degree sequences. Using this result, it is possible to characterize in a combinatorial fashion sample points leading to a non-existent MLE and non-estimability of the probability parameters under a non-existent MLE. The speaker will further indicate some conditions guaranteeing that the MLE exists with probability tending to 1 as the number of nodes increases. Much of this analysis applies also to other well-known models for networks, such as the Rasch model, the Bradley-Terry model and the more general p1 model of Holland and Leinhardt. These results are in fact instantiations of rather general geometric properties of exponential families with polyhedral support that will be illustrated with a simple exponential random graph model.