Past Seminar Series - McGill Statistics Seminars

- Oct 27, 2017
- post
Penalized robust regression estimation with applications to proteomics

Gabriela V. Cohen Freue · Oct 27, 2017
Date: 2017-10-27

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In many current applications, scientists can easily measure a very large number of variables (for example, hundreds of protein levels), some of which are expected be useful to explain or predict a specific response variable of interest. These potential explanatory variables are most likely to contain redundant or irrelevant information, and in many cases, their quality and reliability may be suspect. We developed two penalized robust regression estimators that can be used to identify a useful subset of explanatory variables to predict the response, while protecting the resulting estimator against possible aberrant observations in the data set. Using an elastic net penalty, the proposed estimator can be used to select variables, even in cases with more variables than observations or when many of the candidate explanatory variables are correlated. In this talk, I will present the new estimator and an algorithm to compute it. I will also illustrate its performance in a simulation study and a real data set. This is joint work with Professor Matias Salibian-Barrera, my PhD student David Kepplinger, and my PDF Ezequiel Smuggler.

Read More…
- Oct 20, 2017
- post
Statistical optimization and nonasymptotic robustness

Qiang Sun · Oct 20, 2017
Date: 2017-10-20

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Statistical optimization has generated quite some interest recently. It refers to the case where hidden and local convexity can be discovered in most cases for nonconvex problems, making polynomial algorithms possible. It relies on a careful analysis of the geometry near global optima. In this talk, I will explore this issue by focusing on sparse regression problems in high dimensions. A computational framework named iterative local adaptive majorize-minimization (I-LAMM) will be proposed to simultaneously control algorithmic complexity and statistical error. I-LAMM effectively turns the nonconvex penalized regression problem into a series of convex programs by utilizing the locally strong convexity of the problem when restricting the solution set in an L_1 cone. Computationally, we establish a phase transition phenomenon: it enjoys a linear rate of convergence after a sub-linear burn-in. Statistically, it provides solutions with optimal statistical errors. Extensions to robust regression will be discussed.

Read More…
- Oct 13, 2017
- post
Quantifying spatial flood risks: A comparative study of max-stable models

Anne-Laure Fougères · Oct 13, 2017
Date: 2017-10-13

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In various applications, evaluating spatial risks (such as floods, heatwaves or storms) is a key problem. The aim of this talk is to make use of extreme value theory and max-stable processes to provide quantitative answers to this issue. A review of the literature will be provided, as well as a wide comparative study based on a simulation design mimicking daily rainfall in France. This is a joint work with Cécile Mercadier (Université Claude-Bernard Lyon 1 (UCBL)) and Quentin Sebille (UCBL).

Read More…
- Sep 29, 2017
- post
McNeil: Spectral backtests of forecast distributions with application to risk management | Jasiulis-Goldyn: Asymptotic properties and renewal theory for Kendall random walks

Alex McNeil and Barbara Jasiulis-Goldyn · Sep 29, 2017
Date: 2017-09-29

Time: 14:30-16:30

Location: BURN 1205

Abstract:

McNeil: In this talk we study a class of backtests for forecast distributions in which the test statistic is a spectral transformation that weights exceedance events by a function of the modelled probability level. The choice of the kernel function makes explicit the user’s priorities for model performance. The class of spectral backtests includes tests of unconditional coverage and tests of conditional coverage. We show how the class embeds a wide variety of backtests in the existing literature, and propose novel variants as well. We assess the size and power of the backtests in realistic sample sizes, and in particular demonstrate the tradeoff between power and specificity in validating quantile forecasts.

Read More…
- Sep 22, 2017
- post
BET on independence

Kai Zhang · Sep 22, 2017
Date: 2017-09-22

Time: 14:00-15:00

Location: BRONF179

Abstract:

We study the problem of nonparametric dependence detection. Many existing methods suffer severe power loss due to non-uniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a filtration induced by marginal binary expansions. Through a novel decomposition of the likelihood of contingency tables whose sizes are powers of 2, we show that the interactions of binary variables in the filtration are complete sufficient statistics for dependence. These interactions are also pairwise independent under the null. By utilizing these interactions, the BET avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the optimal rate in sample complexity and (b) by providing clear interpretations of global and local relationships upon rejection of independence. The binary expansion approach also connects the test statistics with the current computing system to allow efficient bitwise implementation. We illustrate the BET by a study of the distribution of stars in the night sky and by an exploratory data analysis of the TCGA breast cancer data.

Read More…
- Sep 15, 2017
- post
Our quest for robust time series forecasting at scale

Farzan Rohani · Sep 15, 2017
Date: 2017-09-15

Time: 15:30-16:30

Location: BURN 1205

Abstract:

The demand for time series forecasting at Google has grown rapidly along with the company since its founding. Initially, the various business and engineering needs led to a multitude of forecasting approaches, most reliant on direct analyst support. The volume and variety of the approaches, and in some cases their inconsistency, called out for an attempt to unify, automate, and extend forecasting methods, and to distribute the results via tools that could be deployed reliably across the company. That is, for an attempt to develop methods and tools that would facilitate accurate large-scale time series forecasting at Google. We were part of a team of data scientists in Search Infrastructure at Google that took on the task of developing robust and automatic large-scale time series forecasting for our organization. In this talk, we recount how we approached the task, describing initial stakeholder needs, the business and engineering contexts in which the challenge arose, and theoretical and pragmatic choices we made to implement our solution. We describe our general forecasting framework, offer details on various tractable subproblems into which we decomposed our overall forecasting task, and provide an example of our forecasting routine applied to publicly available Turkish Electricity data.

Read More…
- Sep 8, 2017
- post
Genomics like it's 1960: Inferring human history

Simon Gravel · Sep 8, 2017
Date: 2017-09-08

Time: 15:30-16:30

Location: BURN 1205

Abstract:

A central goal of population genetics is the inference of the biological, evolutionary and demographic forces that shaped human diversity. Large-scale sequencing experiments provide fantastic opportunities to learn about human history and biology if we can overcome computational and statistical challenges. I will discuss how simple mid-century statistical approaches, such as the jackknife and Kolmogorov equations, can be combined in unexpected ways to solve partial differential equations, optimize genomic study design, and learn about the spread of modern humans since our common African origins.

Read More…
- Apr 6, 2017
- post
Instrumental Variable Regression with Survival Outcomes

Jason Fine · Apr 6, 2017
Date: 2017-04-06

Time: 15:30-16:30

Location: Universite Laval, Pavillon Vachon, Salle 3840

Abstract:

Instrumental variable (IV) methods are popular in non-experimental studies to estimate the causal effects of medical interventions or exposures. These approaches allow for the consistent estimation of such effects even if important confounding factors are unobserved. Despite the increasing use of these methods, there have been few extensions of IV methods to censored data regression problems. We discuss challenges in applying IV structural equational modelling techniques to the proportional hazards model and suggest alternative modelling frameworks. We demonstrate the utility of the accelerated lifetime and additive hazards models for IV analyses with censored data. Assuming linear structural equation models for either the event time or the hazard function, we proposed closed-form, two-stage estimators for the causal effect in the structural models for the failure time outcomes. The asymptotic properties of the estimators are derived and the resulting inferences are shown to perform well in simulation studies and in an application to a data set on the effectiveness of a novel chemotherapeutic agent for colon cancer.

Read More…
- Mar 31, 2017
- post
Distributed kernel regression for large-scale data

Chen Xu · Mar 31, 2017
Date: 2017-03-31

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then aggregated from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data; if so, how efficient does it work? In this talk, I address these fundamental issues for the non-parametric distributed kernel regression, where accurate prediction is the main learning task. I will begin with the naive simple averaging algorithm and then talk about an improved approach via ADMM. The promising preference of these methods is supported by both simulation and real data examples.

Read More…
- Mar 24, 2017
- post
Bayesian sample size determination for clinical trials

Hamid Pezeshk · Mar 24, 2017
Date: 2017-03-24

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Sample size determination problem is an important task in the planning of clinical trials. The problem may be formulated formally in statistical terms. The most frequently used methods are based on the required size, and power of the trial for a specified treatment effect. In contrast to the Bayesian decision-theoretic approach, there is no explicit balancing of the cost of a possible increase in the size of the trial against the benefit of the more accurate information which it would give. In this talk a fully Bayesian approach to the sample size determination problem is discussed. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. The optimal sample size for the trial is then found by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.

Read More…

Date: 2017-10-27

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-10-20

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-10-13

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-09-29

Time: 14:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-09-22

Time: 14:00-15:00

Location: BRONF179

Abstract:

Date: 2017-09-15

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-09-08

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-04-06

Time: 15:30-16:30

Location: Universite Laval, Pavillon Vachon, Salle 3840

Abstract:

Date: 2017-03-31

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2017-03-24

Time: 15:30-16:30

Location: BURN 1205

Abstract: