/categories/crm-colloquium/index.xml CRM-Colloquium - McGill Statistics Seminars
  • High-dimensional changepoint estimation via sparse projection

    Date: 2016-12-01

    Time: 15:30-16:30

    Location: BURN 708

    Abstract:

    Changepoints are a very common feature of Big Data that arrive in the form of a data stream. We study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called ‘inspect’ for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint detection algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms.

  • Efficient tests of covariate effects in two-phase failure time studies

    Date: 2016-10-28

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Two-phase studies are frequently used when observations on certain variables are expensive or difficult to obtain. One such situation is when a cohort exists for which certain variables have been measured (phase 1 data); then, a sub-sample of individuals is selected, and additional data are collected on them (phase 2). Efficiency for tests and estimators can be increased by basing the selection of phase 2 individuals on data collected at phase 1. For example, in large cohorts, expensive genomic measurements are often collected at phase 2, with oversampling of persons with “extreme” phenotypic responses. A second example is case-cohort or nested case-control studies involving times to rare events, where phase 2 oversamples persons who have experienced the event by a certain time. In this talk I will describe two-phase studies on failure times, present efficient methods for testing covariate effects. Some extensions to more complex outcomes and areas needing further development will be discussed.

  • Statistical inference for fractional diffusion processes

    Date: 2016-09-16

    Time: 16:00-17:00

    Location: LB-921.04, Library Building, Concordia Univ.

    Abstract:

    There are some time series which exhibit long-range dependence as noticed by Hurst in his investigations of river water levels along Nile river. Long-range dependence is connected with the concept of self-similarity in that increments of a self-similar process with stationary increments exhibit long-range dependence under some conditions. Fractional Brownian motion is an example of such a process. We discuss statistical inference for stochastic processes modeled by stochastic differential equations driven by a fractional Brownian motion. These processes are termed as fractional diffusion processes. Since fractional Brownian motion is not a semimartingale, it is not possible to extend the notion of a stochastic integral with respect to a fractional Brownian motion following the ideas of Ito integration. There are other methods of extending integration with respect to a fractional Brownian motion. Suppose a complete path of a fractional diffusion process is observed over a finite time interval. We will present some results on inference problems for such processes.

  • Ridges and valleys in the high excursion sets of Gaussian random fields

    Date: 2016-03-10

    Time: 15:30-16:30

    Location: MAASS 217, McGill

    Abstract:

    It is well known that normal random variables do not like taking large values. Therefore, a continuous Gaussian random field on a compact set does not like exceeding a large level. If it does exceed a large level at some point, it tends to go back below the level a short distance away from that point. One, therefore, does not expect the excursion set above a high for such a field to possess any interesting structure. Nonetheless, if we want to know how likely are two points in such an excursion set to be connected by a path (“a ridge”) in the excursion set, how do we figure that out? If we know that a ridge in the excursion set exists (e.g. the field is above a high level on the surface of a sphere), how likely is there to be also a valley (e.g. the field going to below a fraction of the level somewhere inside that sphere)?

  • Causal discovery with confidence using invariance principles

    Date: 2015-12-10

    Time: 15:30-16:30

    Location: UdeM, Pav. Roger-Gaudry, salle S-116

    Abstract:

    What is interesting about causal inference? One of the most compelling aspects is that any prediction under a causal model is valid in environments that are possibly very different to the environment used for inference. For example, variables can be actively changed and predictions will still be valid and useful. This invariance is very useful but still leaves open the difficult question of inference. We propose to turn this invariance principle around and exploit the invariance for inference. If we observe a system in different environments (or under different but possibly not well specified interventions) we can identify all models that are invariant. We know that any causal model has to be in this subset of invariant models. This allows causal inference with valid confidence intervals. We propose different estimators, depending on the nature of the interventions and depending on whether hidden variables and feedbacks are present. Some empirical examples demonstrate the power and possible pitfalls of this approach.

  • Inference regarding within-family association in disease onset times under biased sampling schemes

    Date: 2015-11-26

    Time: 15:30-16:30

    Location: BURN 306

    Abstract:

    In preliminary studies of the genetic basis for chronic conditions, interest routinely lies in the within-family dependence in disease status. When probands are selected from disease registries and their respective families are recruited, a variety of ascertainment bias-corrected methods of inference are available which are typically based on models for correlated binary data. This approach ignores the age that family members are at the time of assessment. We consider copula-based models for assessing the within-family dependence in the disease onset time and disease progression, based on right-censored and current status observation of the non-probands. Inferences based on likelihood, composite likelihood and estimating functions are each discussed and compared in terms of asymptotic and empirical relative efficiency. This is joint work with Yujie Zhong.

  • A knockoff filter for controlling the false discovery rate

    Date: 2015-10-30

    Time: 16:00-17:00

    Location: Salle 1360, Pavillon André-Aisenstadt, Université de Montréa

    Abstract:

    The big data era has created a new scientific paradigm: collect data first, ask questions later. Imagine that we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. We introduce the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method works by constructing fake variables, knockoffs, which can then be used as controls for the true variables; the method achieves exact FDR control in finite-sample settings no matter the design or covariates, the number of variables in the model, and the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. This is joint work with Rina Foygel Barber.

  • A statistical view of some recent climate controversies

    Date: 2015-05-07

    Time: 15:30-16:30

    Location: Université de Sherbrooke

    Abstract:

    This talk looks at some recent climate controversies from a statistical standpoint. The issues are motivated via changepoints and their detection. Changepoints are ubiquitous features in climatic time series, occurring whenever stations relocate or gauges are changed. Ignoring changepoints can produce spurious trend conclusions. Changepoint tests involving cumulative sums, likelihood ratio, and maximums of F-statistics are introduced; the asymptotic distributions of these statistics are quantified under the changepoint-free null hypothesis. The case of multiple changepoints is considered. The methods are used to study several controversies, including extreme temperature trends in the United States and Atlantic Basin tropical cyclone counts and strengths.

  • High-dimensional phenomena in mathematical statistics and convex analysis

    Date: 2014-11-20

    Time: 16:00-17:00

    Location: CRM 1360 (U. de Montréal)

    Abstract:

    Statistical models in which the ambient dimension is of the same order or larger than the sample size arise frequently in different areas of science and engineering. Although high-dimensional models of this type date back to the work of Kolmogorov, they have been the subject of intensive study over the past decade, and have interesting connections to many branches of mathematics (including concentration of measure, random matrix theory, convex geometry, and information theory). In this talk, we provide a broad overview of the general area, including vignettes on phase transitions in high-dimensional graph recovery, and randomized approximations of convex programs.

  • Adaptive piecewise polynomial estimation via trend filtering

    Date: 2014-04-11

    Time: 15:30-16:30

    Location: Salle KPMG, 1er étage HEC Montréal

    Abstract:

    We will discuss trend filtering, a recently proposed tool of Kim et al. (2009) for nonparametric regression. The trend filtering estimate is defined as the minimizer of a penalized least squares criterion, in which the penalty term sums the absolute kth order discrete derivatives over the input points. Perhaps not surprisingly, trend filtering estimates appear to have the structure of kth degree spline functions, with adaptively chosen knot points (we say “appear” here as trend filtering estimates are not really functions over continuous domains, and are only defined over the discrete set of inputs). This brings to mind comparisons to other nonparametric regression tools that also produce adaptive splines; in particular, we will compare trend filtering to smoothing splines, which penalize the sum of squared derivatives across input points, and to locally adaptive regression splines (Mammen & van de Geer 1997), which penalize the total variation of the kth derivative.