/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Markov switching regular vine copulas

    Date: 2012-10-05

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Using only bivariate copulas as building blocks, regular vines(R-vines) constitute a flexible class of high-dimensional dependence models. In this talk we introduce a Markov switching R-vine copula model, combining the flexibility of general R-vine copulas with the possibility for dependence structures to change over time. Frequentist as well as Bayesian parameter estimation is discussed. Further, we apply the newly proposed model to examine the dependence of exchange rates as well as stock and stock index returns. We show that changes in dependence are usually closely interrelated with periods of market stress. In such times the Value at Risk of an asset portfolio is significantly underestimated when changes in the dependence structure are ignored.

  • The current state of Q-learning for personalized medicine

    Date: 2012-09-28

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In this talk, I will provide an introduction to DTRs and an overview the state of the art (and science) of Q-learning, a popular tool in reinforcement learning. The use of Q-learning and its variance in randomized and non-randomized studies will be discussed, as well as issues concerning inference as the resulting estimators are not always regular. Current and future directions of interest will also be considered.

  • Hypothesis testing in finite mixture models: from the likelihood ratio test to EM-test

    Date: 2012-04-05

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In the presence of heterogeneity, a mixture model is most natural to characterize the random behavior of the samples taken from such populations. Such strategy has been widely employed in applications ranging from genetics, information technology, marketing, to finance. Studying the mixing structure behind a random sample from the population allows us to infer the degree of heterogeneity with important implications in applications such as the presence of disease subgroups in genetics. The statistical problem is to test the hypotheses on the order of the finite mixture models. There has been continued interest in the limiting behavior of the likelihood ratio tests. The non-regularity of the finite mixture models has provided statisticians ample examples of unusual limiting distributions. Yet many of such results are not convenient for conducting hypothesis tests. Motivated at overcoming such difficulties, we have developed a number of strategies to obtain tests with high efficiency yet easy to use limiting distributions. The latest development is a class of EM-tests which are advantageous in many respects. Their limiting distributions are easier to derive mathematically, simple for implementation in data analysis and valid for more general class of mixture models without restrictions on the space of the mixing distribution. The simulation indicates the limiting distributions have good precision at approximating the finite sample distributions in the examples investigated.

  • A matching-based approach to assessing the surrogate value of a biomarker

    Date: 2012-03-30

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Statisticians have developed a number of frameworks which can be used to assess the surrogate value of a biomarker, i.e. establish whether treatment effects on a biological quantity measured shortly after administration of treatment predict treatment effects on the clinical endpoint of interest. The most commonly applied of these frameworks is due to Prentice (1989), who proposed a set of criteria which a surrogate marker should satisfy. However, verifying these criteria using observed data can be challenging due to the presence of unmeasured simultaneous predictors (i.e. confounders) which influence both the potential surrogate and the outcome. In this work, we adapt a technique proposed by Rosenbaum (2002) for observational studies, in which observations are matched and the odds of treatment within each matched pair is bounded. This yields a straightforward and interpretable sensitivity analysis which can be performed particularly efficiently for certain types of test statistics. In this talk, I will introduce the surrogate endpoint problem, discuss the details of my proposed technique for assessing surrogate value, and illustrate with some simulated examples inspired by the problem of identifying immune surrogates in HIV vaccine trials.

  • Model selection principles in misspecified models

    Date: 2012-03-23

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Model selection is of fundamental importance to high-dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Bayesian principle and the Kullback-Leibler divergence principle, which lead to the Bayesian information criterion and Akaike information criterion, respectively, when models are correctly specified. Yet model misspecification is unavoidable in practice. We derive novel asymptotic expansions of the two well-known principles in misspecified generalized linear models, which give the generalized BIC (GBIC) and generalized AIC. A specific form of prior probabilities motivated by the Kullback-Leibler divergence principle leads to the generalized BIC with prior probability ($\mbox{GBIC}_p$), which can be naturally decomposed as the sum of the negative maximum quasi-log-likelihood, and a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the new methods for model selection in both correctly specified and misspecified models.

  • Variable selection in longitudinal data with a change-point

    Date: 2012-03-16

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Follow-up studies are frequently carried out to investigate the evolution of measurements through time, taken on a set of subjects. These measurements (responses) are bound to be influenced by subject specific covariates and if a regression model is used the data analyst is faced with the problem of selecting those covariates that “best explain” the data. For example, in a clinical trial, subjects may be monitored for a response following the administration of a treatment with a view of selecting the covariates that are best predictive of a treatment response. This variable selection setting is standard. However, more realistically, there will often be an unknown delay from the administration of a treatment before it has a measurable effect. This delay will not be directly observable since it is a property of the distribution of responses rather than of any particular trajectory of responses. Briefly, each subject will have an unobservable change-point. With a change-point component added, the variable selection problem necessitates the use of penalized likelihood methods. This is because the number of putative covariates for the responses, as well as the change-point distribution, could be large relative to the follow-up time and/or the number of subjects; variable selection in a change-point setting does not appear to have been studied in the literature. In this talk I will briefly introduce the multi-path change-point problem. I will show how variable selection for the covariates before the change, after the change, as well as for the change-point distribution, reduces to variable selection for a finite mixture of multivariate distributions. I will discuss the performance of my model selection methods using an example on cognitive decline in subjects with Alzheimer’s disease and through simulations.

  • Estimating a variance-covariance surface for functional and longitudinal data

    Date: 2012-03-02

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In functional data analysis, as in its multivariate counterpart, estimates of the bivariate covariance kernel σ(s,t ) and its inverse are useful for many things, and we need the inverse of a covariance matrix or kernel especially often. However, the dimensionality of functional observations often exceeds the sample size available to estimate σ(s,t, and then the analogue S of the multivariate sample estimate is singular and non-invertible. Even when this is not the case, the high dimensionality S often implies unacceptable sample variability and loss of degrees of freedom for model fitting. The common practice of employing low-dimensional principal component approximations to σ(s,t) to achieve invertibility also raises serious issues.

  • McGillivray: A penalized quasi-likelihood approach for estimating the number of states in a hidden Markov model | Best: Risk-set sampling and left truncation in survival analysis

    Date: 2012-02-17

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    McGillivray: In statistical applications of hidden Markov models (HMMs), one may have no knowledge of the number of hidden states (or order) of the model needed to be able to accurately represent the underlying process of the data. The problem of estimating the number of hidden states of the HMM is thus brought to the forefront. In this talk, we present a penalized quasi-likelihood approach for order estimation in HMMs which makes use of the fact that the marginal distribution of the observations from a HMM is a finite mixture model. The method starts with a HMM with a large number of states and obtains a model of lower order by clustering and combining similar states of the model through two penalty functions. We assess the performance of the new method via extensive simulation studies for Normal and Poisson HMMs.

  • Du: Simultaneous fixed and random effects selection in finite mixtures of linear mixed-effects models | Harel: Measuring fatigue in systemic sclerosis: a comparison of the SF-36 vitality subscale and FACIT fatigue scale using item response theory

    Date: 2012-02-03

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Du: Linear mixed-effects (LME) models are frequently used for modeling longitudinal data. One complicating factor in the analysis of such data is that samples are sometimes obtained from a population with significant underlying heterogeneity, which would be hard to capture by a single LME model. Such problems may be addressed by a finite mixture of linear mixed-effects (FMLME) models, which segments the population into subpopulations and models each subpopulation by a distinct LME model. Often in the initial stage of a study, a large number of predictors are introduced. However, their associations to the response variable vary from one component to another of the FMLME model. To enhance predictability and to obtain a parsimonious model, it is of great practical interest to identify the important effects, both fixed and random, in the model. Traditional variable selection techniques such as stepwise deletion and subset selection are computationally expensive as the number of covariates and components in the mixture model increases. In this talk, we introduce a penalized likelihood approach and propose a nested EM algorithm for efficient numerical computations. Our estimators are shown to possess desirable properties such as consistency, sparsity and asymptotic normality. We illustrate the performance of our method through simulations and a systemic sclerosis data example.

  • Applying Kalman filtering to problems in causal inference

    Date: 2012-01-27

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A common problem in observational studies is estimating the causal effect of time-varying treatment in the presence of a time varying confounder. When random assignment of subjects to comparison groups is not possible, time-varying confounders can cause bias in estimating causal effects even after standard regression adjustment if past treatment history is a predictor of future confounders. To eliminate the bias of standard methods for estimating the causal effect of time varying treatment, Robins developed a number of innovative methods for discrete treatment levels, including G-computation, G-estimation, and marginal structural models (MSMs). However, there does not currently exist straight-forward applications of G-Estimation and MSMs for continuous treatment. In this talk, I will introduce an alternative approach to previous methods which utilize the Kalman filter. The key advantage to the Kalman filter approach is that the model easily accommodates continuous levels of treatment.