/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Simultaneous white noise models and shrinkage recovery of functional data

    Date: 2015-01-16

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We consider the white noise representation of functional data taken as i.i.d. realizations of a Gaussian process. The main idea is to establish an asymptotic equivalence in Le Cam’s sense between an experiment which simultaneously describes these realizations and a collection of white noise models. In this context, we project onto an arbitrary basis and apply a novel variant of Stein-type estimation for optimal recovery of the realized trajectories. A key inequality is derived showing that the corresponding risks, conditioned on the underlying curves, are minimax optimal and can be made arbitrarily close to those that an oracle with knowledge of the process would attain. Empirical performance is illustrated through simulated and real data examples.

  • Mixtures of coalesced generalized hyperbolic distributions

    Date: 2015-01-13

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A mixture of coalesced generalized hyperbolic distributions is developed by joining a finite mixture of generalized hyperbolic distributions with a mixture of multiple scaled generalized hyperbolic distributions. The result is a mixture of mixtures with shared model parameters and common mode. We begin by discussing the generalized hyperbolic distribution, which has the t, Gaussian and others as special cases. The generalized hyperbolic distribution can represented as a normal-variance mixture using a generalized inverse Gaussian distribution. This representation makes it a suitable candidate for the expectation-maximization algorithm. Secondly, we discuss the multiple scale generalized hyperbolic distribution which arises via implementation of a multi-dimensional weight function. A parameter estimation scheme is developed using the ever-expanding class of MM algorithms and the Bayesian information criterion is used for model selection. Special consideration is given to the contour shape. We use the coalesced distribution for clustering and compare them to finite mixtures of skew-t distributions using simulated and real data sets. Finally, the role of generalized hyperbolic mixtures within the wider model-based clustering, classification, and density estimation literature is discussed.

  • Space-time data analysis: Out of the Hilbert box

    Date: 2015-01-09

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Given the discouraging state of current efforts to curb global warming, we can imagine that we will soon turn our attention to mitigation. On a global scale, distressed populations will turn to national and international organizations for solutions to dramatic problems caused by climate change. These institutions in turn will mandate the collection of data on a scale and resolution that will present extraordinary statistical and computational challenges to those of us viewed as having the appropriate expertise. A review of the current state of our space-time data analysis machinery suggests that we have much to do. Most of current spatial modelling methodology is based on concepts translated from time series analysis, is heavily dependent on various kinds of stationarity assumptions, uses the Gaussian distribution to model data and depends on a priori coordinate systems that do not exist in nature. A way forward from this restrictive framework is proposed by modelling data over textured domains using layered coordinate systems.

  • Testing for structured Normal means

    Date: 2014-12-12

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We will discuss the detection of pattern in images and graphs from a high-dimensional Gaussian measurement. This problem is relevant to many applications including detecting anomalies in sensor and computer networks, large-scale surveillance, co-expressions in gene networks, disease outbreaks, etc. Beyond its wide applicability, structured Normal means detection serves as a case study in the difficulty of balancing computational complexity with statistical power. We will begin by discussing the detection of active rectangles in images and sensor grids. We will develop an adaptive scan test and determine its asymptotic distribution. We propose an approximate algorithm that runs in nearly linear time but achieves the same asymptotic distribution as the naive, quadratic run-time algorithm. We will move on to the more general problem of detecting a well-connected active subgraph within a graph in the Normal means context. Because the generalized likelihood ratio test is computationally infeasible, we propose approximate algorithms and study their statistical efficiency. One such algorithm that we develop is the graph Fourier scan statistic, whose statistical performance is characterized by the spectrum of the graph Laplacian. Another relaxation that we have developed is the Lovasz extended scan statistic (LESS), which is based on submodular optimization and the performance is described using electrical network theory. We also introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph. For each of these tests we compare their statistical guarantees to an information theoretic lower bound.

  • Copula model selection: A statistical approach

    Date: 2014-12-05

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Copula model selection is an important problem because similar but differing copula models can offer different conclusions surrounding the dependence structure of random variables. Chen & Fan (2005) proposed a model selection method involving a statistical hypothesis test. The hypothesis test attempts to take into account the randomness of the AIC and other likelihood-based model selection methods for finite samples. Performance of the test compared to the more common approach of AIC is illustrated in a series of simulations.

  • Model-based methods of classification with applications

    Date: 2014-11-28

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Model-based clustering via finite mixture models is a popular clustering method for finding hidden structures in data. The model is often assumed to be a finite mixture of multivariate normal distributions; however, flexible extensions have been developed over recent years. This talk demonstrates some methods employed in unsupervised, semi-supervised, and supervised classification that include skew-normal and skew-t mixture models. Both real and simulated data sets are used to demonstrate the efficacy of these techniques.

  • Estimating by solving nonconvex programs: Statistical and computational guarantees

    Date: 2014-11-21

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Many statistical estimators are based on solving nonconvex programs. Although the practical performance of such methods is often excellent, the associated theory is frequently incomplete, due to the potential gaps between global and local optima. In this talk, we present theoretical results that apply to all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for error-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. Under suitable regularity conditions, our theory guarantees that any local optimum of the composite objective function lies within statistical precision of the true parameter vector. This result closes the gap between theory and practice for these methods.

  • Bridging the gap: A likelihood function approach for the analysis of ranking data

    Date: 2014-11-14

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In the parametric setting, the notion of a likelihood function forms the basis for the development of tests of hypotheses and estimation of parameters. Tests in connection with the analysis of variance stem entirely from considerations of the likelihood function. On the other hand, non- parametric procedures have generally been derived without any formal mechanism and are often the result of clever intuition. In this talk, we propose a more formal approach for deriving tests involving the use of ranks. Specifically, we define a likelihood function motivated by characteristics of the ranks of the data and demonstrate that this leads to well-known tests of hypotheses. We also point to various areas of further exploration.

  • Bayesian regression with B-splines under combinations of shape constraints and smoothness properties

    Date: 2014-11-07

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We approach the problem of shape constrained regression from a Bayesian perspective. A B-spline basis is used to model the regression function. The smoothness of the regression function is controlled by the order of the B-splines and the shape is controlled by the shape of an associated control polygon. Controlling the shape of the control polygon reduces to some inequality constraints on the spline coefficients. Our approach enables us to take into account combinations of shape constraints and to localize each shape constraint on a given interval. The performances of our method is investigated through a simulation study. Applications to real data sets from the food industry and Global Warming are provided.

  • A copula-based model for risk aggregation

    Date: 2014-10-31

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A flexible approach is proposed for risk aggregation. The model consists of a tree structure, bivariate copulas, and marginal distributions. The construction relies on a conditional independence assumption whose implications are studied. Selection the tree structure, estimation and model validation are illustrated using data from a Canadian property and casualty insurance company.

    Speaker

    Marie-Pier Côté is a PhD student in the Department of Mathematics and Statistics at McGill University.