/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Selective inference for dynamic treatment regimes via the LASSO

    Date: 2018-09-28

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Constructing an optimal dynamic treatment regime become complex when there are large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history over time. Existing methods only focus on selecting the important variables for the decision-making process and fall short in providing inference for the selected model. We fill this gap by leveraging the conditional selective inference methodology. We show that the proposed method is asymptotically valid given certain rate assumptions in semiparametric regression.

  • Possession Sketches: Mapping NBA Strategies

    Date: 2018-09-21

    Time: 09:30-10:15

    Location: Bronfman Building 001

    Abstract:

    We present Possession Sketches, a new machine learning method for organizing and exploring a database of basketball player-tracks. Our method organizes basketball possessions by offensive structure. We first develop a model for populating a dictionary of short, repeated, and spatially registered actions. Each action corresponds to an interpretable type of player movement. We examine statistical patterns in these actions, and show how they can be used to describe individual player behavior. Leveraging this vocabulary of actions, we develop a hierarchical model that describes interactions between players. Our approach draws on the topic-modeling literature, extending Latent Dirichlet Allocation (LDA) through a novel representation of player movement data which uses techniques common in animation and video game design. We show that our model is able to group together possessions with similar offensive structure, allowing for efficient search and exploration of the entire database of player-tracking data. We show that our model finds repeated offensive structure in teams (e.g. strategy), providing a much more sophisticated, yet interpretable lens into basketball player-tracking data. This is joint work with Andrew Miller.

  • Quantile LASSO in Nonparametric Models with Changepoints Under Optional Shape Constraints

    Date: 2018-09-14

    Time: 15:30-16:30

    Location: BURN 1104

    Abstract:

    Nonparametric models are popular modeling tools because of their natural overall flexibility. In our approach, we apply nonparametric techniques for panel data structures with changepoints and optional shape constraints and the estimation is performed in a fully data driven manner by utilizing atomic pursuit methods – LASSO regularization techniques in particular. However, in order to obtain robust estimates and, also, to have a more complex insight into the underlying data structure, we target conditional quantiles rather then the conditional mean only. The whole estimation process and the following inference become both more challenging but the results are more useful in practical applications. The underlying model is firstly introduced and some theoretical results are presented. The proposed methodology is applied for a real data scenario and some finite sample properties are investigated via an extensive simulation study. This is a joint work with Ivan Mizera, University of Alberta and Gabriela Ciuperca, University of Lyon

  • Association Measures for Clustered Competing Risks Data

    Date: 2018-09-07

    Time: 15:30-16:30

    Location: BURN 1104

    Abstract:

    In this work, we propose a semiparametric model for multivariate clustered competing risks data when the cause-specific failure times and the occurrence of competing risk events among subjects within the same cluster are of interest. The cause-specific hazard functions are assumed to follow Cox proportional hazard models, and the associations between failure times given the same or different cause events and the associations between occurrences of competing risk events within the same cluster are investigated through copula models. A cross-odds ratio measure is explored under our proposed models. Two-stage estimation procedure is proposed in which the marginal models are estimated in the first stage, and the dependence parameters are estimated via an Expectation-Maximization algorithm in the second stage. The proposed estimators are shown to yield consistent and asymptotically normal under mild regularity conditions. Simulation studies are conducted to assess finite sample performance of the proposed method. The proposed technique is demonstrated through an application to a multicenter Bone Marrow transplantation dataset.

  • Methodological challenges in using point-prevalence versus cohort data in risk factor analyses of hospital-acquired infections

    Date: 2018-04-27

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    To explore the impact of length-biased sampling on the evaluation of risk factors of nosocomial infections in point-prevalence studies. We used cohort data with full information including the exact date of the nosocomial infection and mimicked an artificial one-day prevalence study by picking a sample from this cohort study. Based on the cohort data, we studied the underlying multi-state model which accounts for nosocomial infection as an intermediate and discharge/death as competing events. Simple formulas are derived to display relationships between risk-, hazard- and prevalence odds ratios. Due to length-biased sampling, long-stay and thus sicker patients are more likely to be sampled. In addition, patients with nosocomial infections usually stay longer in hospital. We explored mechanisms which are -due to the design- hidden in prevalence data. In our example, we showed that prevalence odds ratios were usually less pronounced than risk odds ratios but more pronounced than hazard ratios. Thus, to avoid misinterpretation, knowledge of the mechanisms from the underlying multi-state model are essential for the interpretation of risk factors derived from point-prevalence data.

  • Kernel Nonparametric Overlap-based Syncytial Clustering

    Date: 2018-04-20

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Standard clustering algorithms can find regular-structured clusters such as ellipsoidally- or spherically-dispersed groups, but are more challenged with groups lacking formal structure or definition. Syncytial clustering is the name that we introduce for methods that merge groups obtained from standard clustering algorithms in order to reveal complex group structure in the data. Here, we develop a distribution-free fully-automated syncytial algorithm that can be used with the computationally efficient k-means or other algorithms. Our approach computes the cumulative distribution function of the normed residuals from an appropriately fit k-groups model and calculates the nonparametric overlap between all pairs of groups. Groups with high pairwise overlap are merged as long as the generalized overlap decreases. Our methodology is always a top performer in identifying groups with regular and irregular structures in many datasets. We use our method to identify the distinct kinds of activation in a functional Magnetic Resonance Imaging study.

  • Empirical likelihood and robust regression in diffusion tensor imaging data analysis

    Date: 2018-04-06

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    With modern technology development, functional responses are observed frequently in various scientific fields including neuroimaging data analysis. Empirical likelihood as a nonparametric data-driven technique has become an important statistical inference methodology. In this paper, motivated by diffusion tensor imaging (DTI) data we propose three generalized empirical likelihood-based methods that accommodate within-curve dependence on the varying coefficient model with functional responses and embed a robust regression idea. To avoid the loss of efficiency in statistical inference, we take into consideration within-curve variance-covariance matrix in the subjectwise and elementwise empirical likelihood methods. We develop several statistical inference procedures for maximum empirical likelihood estimators (MELEs) and empirical log likelihood (ELL) ratio functions, and systematically study their asymptotic properties. We first establish the weak convergence of the MELEs and the ELL ratio processes, and derived a nonparametric version of the Wilks theorem for the limiting distributions of the ELLs at any designed point. We propose a global test for linear hypotheses of varying coefficient functions and construct simultaneous confidence bands for each individual effect curve based on MELEs, and construct simultaneous confidence regions for varying coefficient functions based on ELL ratios. A Monte Carlo simulation is conducted to examine the finite-sample performance of the proposed procedures. Finally, we illustrate the estimation and inference procedures on MELEs of varying coefficient model to a diffusion tensor imaging data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. Joint work with Xingcai Zhou (Nanjing Audit University), Rohana Karunamuni and Adam Kashlak (University of Alberta).

  • Some development on dynamic computer experiments

    Date: 2018-03-23

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Computer experiments refer to the study of real systems using complex simulation models. They have been widely used as efficient, economical alternatives to physical experiments. Computer experiments with time series outputs are called dynamic computer experiments. In this talk, we consider two problems of such experiments: emulation of large-scale dynamic computer experiments and inverse problem. For the first problem, we proposed a computationally efficient modelling approach which sequentially finds a set of local design points based on a new criterion specifically designed for emulating dynamic computer simulators. Singular value decomposition based Gaussian process models are built with the sequentially chosen local data. To update the models efficiently, an empirical Bayesian approach is introduced. The second problem aims to extract an optimal input of dynamic computer simulator whose response matches a field observation as closely as possible. A sequential design approach is employed and a novel expected improvement criterion is proposed. A real application is discussed to support the efficiency of the proposed approaches.

  • Statistical Genomics for Understanding Complex Traits

    Date: 2018-03-16

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Over the last decade, advances in measurement technologies has enabled researchers to generate multiple types of high-dimensional “omics” datasets for large cohorts. These data provide an opportunity to derive a mechanistic understanding of human complex traits. However, inferring meaningful biological relationships from these data is challenging due to high-dimensionality , noise, and abundance of confounding factors. In this talk, I’ll describe statistical approaches for robust analysis of genomic data from large population studies, with a focus on 1) understanding the nature of confounding and approaches for addressing them and 2) understanding the genomic correlates of aging and dementia.

  • Sparse Penalized Quantile Regression: Method, Theory, and Algorithm

    Date: 2018-02-23

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Sparse penalized quantile regression is a useful tool for variable selection, robust estimation, and heteroscedasticity detection in high-dimensional data analysis. We discuss the variable selection and estimation properties of the lasso and folded concave penalized quantile regression via non-asymptotic arguments. We also consider consistent parameter tuning therein. The computational issue of the sparse penalized quantile regression has not yet been fully resolved in the literature, due to non-smoothness of the quantile regression loss function. We introduce fast alternating direction method of multipliers (ADMM) algorithms for computing the sparse penalized quantile regression. Numerical examples demonstrate the competitive performance of our algorithm: it significantly outperforms several other fast solvers for high-dimensional penalized quantile regression.