/categories/crm-colloquium/index.xml CRM-Colloquium - McGill Statistics Seminars
  • Machine Learning for Causal Inference

    Date: 2020-09-11

    Time: 16:00-17:00

    Zoom Link

    Meeting ID: 965 2536 7383

    Passcode: 421254

    Abstract:

    Given advances in machine learning over the past decades, it is now possible to accurately solve difficult non-parametric prediction problems in a way that is routine and reproducible. In this talk, I’ll discuss how machine learning tools can be rigorously integrated into observational study analyses, and how they interact with classical statistical ideas around randomization, semiparametric modeling, double robustness, etc. I’ll also survey some recent advances in methods for treatment heterogeneity. When deployed carefully, machine learning enables us to develop causal estimators that reflect an observational study design more closely than basic linear regression based methods.

  • Neyman-Pearson classification: parametrics and sample size requirement

    Date: 2020-02-28

    Time: 15:30-16:30

    Location: BURNSIDE 1104

    Abstract:

    The Neyman-Pearson (NP) paradigm in binary classification seeks classifiers that achieve a minimal type II error while enforcing the prioritized type I error controlled under some user-specified level alpha. This paradigm serves naturally in applications such as severe disease diagnosis and spam detection, where people have clear priorities among the two error types. Recently, Tong, Feng and Li (2018) proposed a nonparametric umbrella algorithm that adapts all scoring-type classification methods (e.g., logistic regression, support vector machines, random forest) to respect the given type I error (i.e., conditional probability of classifying a class 0 observation as class 1 under the 0-1 coding) upper bound alpha with high probability, without specific distributional assumptions on the features and the responses. Universal the umbrella algorithm is, it demands an explicit minimum sample size requirement on class 0, which is often the more scarce class, such as in rare disease diagnosis applications. In this work, we employ the parametric linear discriminant analysis (LDA) model and propose a new parametric thresholding algorithm, which does not need the minimum sample size requirements on class 0 observations and thus is suitable for small sample applications such as rare disease diagnosis. Leveraging both the existing nonparametric and the newly proposed parametric thresholding rules, we propose four LDA-based NP classifiers, for both low- and high-dimensional settings. On the theoretical front, we prove NP oracle inequalities for one proposed classifier, where the rate for excess type II error benefits from the explicit parametric model assumption. Furthermore, as NP classifiers involve a sample splitting step of class 0 observations, we construct a new adaptive sample splitting scheme that can be applied universally to NP classifiers, and this adaptive strategy reduces the type II error of these classifiers. The proposed NP classifiers are implemented in the R package nproc.

  • Formulation and solution of stochastic inverse problems for science and engineering models

    Date: 2019-11-22

    Time: 16:00-17:00

    Location: Pavillon Kennedy, PK-5115, UQAM

    Abstract:

    The stochastic inverse problem of determining probability structures on input parameters for a physics model corresponding to a given probability structure on the output of the model forms the core of scientific inference and engineering design. We describe a formulation and solution method for stochastic inverse problems that is based on functional analysis, differential geometry, and probability/measure theory. This approach yields a computationally tractable problem while avoiding alterations of the model like regularization and ad hoc assumptions about the probability structures. We present several examples, including a high-dimensional application to determination of parameter fields in storm surge models. We also describe work aimed at defining a notion of condition for stochastic inverse problems and tackling the related problem of designing sets of optimal observable quantities.

  • General Bayesian Modeling

    Date: 2019-11-01

    Time: 16:00-17:00

    Location: BURN 1104

    Abstract:

    The work is motivated by the inflexibility of Bayesian modeling; in that only parameters of probability models are required to be connected with data. The idea is to generalize this by allowing arbitrary unknowns to be connected with data via loss functions. An updating process is then detailed which can be viewed as arising in at least a couple of ways - one being purely axiomatically driven. The further exploration of replacing probability model based approaches to inference with loss functions is ongoing. Joint work with Chris Holmes, Pier Giovanni Bissiri and Simon Lyddon.

  • Network models, sampling, and symmetry properties

    Date: 2019-02-01

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A recent body of work, by myself and many others, aims to develop a statistical theory of network data for problems a single network is observed. Of the models studied in this area, graphon models are probably most widely known in statistics. I will explain the relationship between three aspects of this work: (1) Specific models, such as graphon models, graphex models, and edge-exchangeable graphs. (2) Sampling theory for networks, specifically in the case statisticians might refer to as an infinite-population limit. (3) Invariance properties, especially various forms of exchangeability. I will also present recent results that show how statistically relevant results (such as central limit theorems) can be derived from such invariance properties.

  • The Law of Large Populations: The return of the long-ignored N and how it can affect our 2020 vision

    Date: 2018-02-16

    Time: 15:30-16:30

    Location: McGill University, OTTO MAASS 217

    Abstract:

    For over a century now, we statisticians have successfully convinced ourselves and almost everyone else, that in statistical inference the size of the population N can be ignored, especially when it is large. Instead, we focused on the size of the sample, n, the key driving force for both the Law of Large Numbers and the Central Limit Theorem. We were thus taught that the statistical error (standard error) goes down with n typically at the rate of 1/√n. However, all these rely on the presumption that our data have perfect quality, in the sense of being equivalent to a probabilistic sample. A largely overlooked statistical identity, a potential counterpart to the Euler identity in mathematics, reveals a Law of Large Populations (LLP), a law that we should be all afraid of. That is, once we lose control over data quality, the systematic error (bias) in the usual estimators, relative to the benchmarking standard error from simple random sampling, goes up with N at the rate of √N. The coefficient in front of √N can be viewed as a data defect index, which is the simple Pearson correlation between the reporting/recording indicator and the value reported/recorded. Because of the multiplier√N, a seemingly tiny correlation, say, 0.005, can have detrimental effect on the quality of inference. Without understanding of this LLP, “big data” can do more harm than good because of the drastically inflated precision assessment hence a gross overconfidence, setting us up to be caught by surprise when the reality unfolds, as we all experienced during the 2016 US presidential election. Data from Cooperative Congressional Election Study (CCES, conducted by Stephen Ansolabehere, Douglas River and others, and analyzed by Shiro Kuriwaki), are used to estimate the data defect index for the 2016 US election, with the aim to gain a clearer vision for the 2020 election and beyond.

  • 150 years (and more) of data analysis in Canada

    Date: 2017-11-24

    Time: 15:30-16:30

    Location: LEA 232

    Abstract:

    As Canada celebrates its 150th anniversary, it may be good to reflect on the past and future of data analysis and statistics in this country. In this talk, I will review the Victorian Statistics Movement and its effect in Canada, data analysis by a Montréal physician in the 1850s, a controversy over data analysis in the 1850s and 60s centred in Montréal, John A. MacDonald’s use of statistics, the Canadian insurance industry and the use of statistics, the beginning of mathematical statistics in Canada, the Fisherian revolution, the influence of Fisher, Neyman and Pearson, the computer revolution, and the emergence of data science.

  • McNeil: Spectral backtests of forecast distributions with application to risk management | Jasiulis-Goldyn: Asymptotic properties and renewal theory for Kendall random walks

    Date: 2017-09-29

    Time: 14:30-16:30

    Location: BURN 1205

    Abstract:

    McNeil: In this talk we study a class of backtests for forecast distributions in which the test statistic is a spectral transformation that weights exceedance events by a function of the modelled probability level. The choice of the kernel function makes explicit the user’s priorities for model performance. The class of spectral backtests includes tests of unconditional coverage and tests of conditional coverage. We show how the class embeds a wide variety of backtests in the existing literature, and propose novel variants as well. We assess the size and power of the backtests in realistic sample sizes, and in particular demonstrate the tradeoff between power and specificity in validating quantile forecasts.

  • Instrumental Variable Regression with Survival Outcomes

    Date: 2017-04-06

    Time: 15:30-16:30

    Location: Universite Laval, Pavillon Vachon, Salle 3840

    Abstract:

    Instrumental variable (IV) methods are popular in non-experimental studies to estimate the causal effects of medical interventions or exposures. These approaches allow for the consistent estimation of such effects even if important confounding factors are unobserved. Despite the increasing use of these methods, there have been few extensions of IV methods to censored data regression problems. We discuss challenges in applying IV structural equational modelling techniques to the proportional hazards model and suggest alternative modelling frameworks. We demonstrate the utility of the accelerated lifetime and additive hazards models for IV analyses with censored data. Assuming linear structural equation models for either the event time or the hazard function, we proposed closed-form, two-stage estimators for the causal effect in the structural models for the failure time outcomes. The asymptotic properties of the estimators are derived and the resulting inferences are shown to perform well in simulation studies and in an application to a data set on the effectiveness of a novel chemotherapeutic agent for colon cancer.

  • Inference in dynamical systems

    Date: 2017-03-17

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures. We also relate Bayesian inference to the thermodynamic formalism in tracking dynamical systems.