/tags/2020-fall/index.xml 2020 Fall - McGill Statistics Seminars
  • Large-scale Network Inference

    Date: 2020-09-25

    Time: 14:00-15:00

    Zoom Link

    Meeting ID: 939 4707 7997

    Passcode: no password

    Abstract:

    Network data is prevalent in many contemporary big data applications in which a common interest is to unveil important latent links between different pairs of nodes. Yet a simple fundamental question of how to precisely quantify the statistical uncertainty associated with the identification of latent links still remains largely unexplored. In this paper, we propose the method of statistical inference on membership profiles in large networks (SIMPLE) in the setting of degree-corrected mixed membership model, where the null hypothesis assumes that the pair of nodes share the same profile of community memberships. In the simpler case of no degree heterogeneity, the model reduces to the mixed membership model for which an alternative more robust test is also proposed. Both tests are of the Hotelling-type statistics based on the rows of empirical eigenvectors or their ratios, whose asymptotic covariance matrices are very challenging to derive and estimate. Nevertheless, their analytical expressions are unveiled and the unknown covariance matrices are consistently estimated. Under some mild regularity conditions, we establish the exact limiting distributions of the two forms of SIMPLE test statistics under the null hypothesis and contiguous alternative hypothesis. They are the chi-square distributions and the noncentral chi-square distributions, respectively, with degrees of freedom depending on whether the degrees are corrected or not. We also address the important issue of estimating the unknown number of communities and establish the asymptotic properties of the associated test statistics. The advantages and practical utility of our new procedures in terms of both size and power are demonstrated through several simulation examples and real network applications.

  • BdryGP: a boundary-integrated Gaussian process model for computer code emulation

    Date: 2020-09-18

    Time: 15:30-16:30

    Zoom Link

    Meeting ID: 924 5390 4989

    Passcode: 690084

    Abstract:

    With advances in mathematical modeling and computational methods, complex phenomena (e.g., universe formations, rocket propulsion) can now be reliably simulated via computer code. This code solves a complicated system of equations representing the underlying science of the problem. Such simulations can be very time-intensive, requiring months of computation for a single run. Gaussian processes (GPs) are widely used as predictive models for “emulating” this expensive computer code. Yet with limited training data on a high-dimensional parameter space, such models can suffer from poor predictive performance and physical interpretability. Fortunately, in many physical applications, there is additional boundary information on the code beforehand, either from governing physics or scientific knowledge. We propose a new BdryGP model which incorporates such boundary information for prediction. We show that BdryGP not only enjoys improved convergence rates over standard GP models which do not incorporate boundaries, but is also more resistant to the ``curse-of-dimensionality’’ in nonparametric regression. We then demonstrate the improved predictive performance and posterior contraction of the BdryGP model on several test problems in the literature.

  • Machine Learning for Causal Inference

    Date: 2020-09-11

    Time: 16:00-17:00

    Zoom Link

    Meeting ID: 965 2536 7383

    Passcode: 421254

    Abstract:

    Given advances in machine learning over the past decades, it is now possible to accurately solve difficult non-parametric prediction problems in a way that is routine and reproducible. In this talk, I’ll discuss how machine learning tools can be rigorously integrated into observational study analyses, and how they interact with classical statistical ideas around randomization, semiparametric modeling, double robustness, etc. I’ll also survey some recent advances in methods for treatment heterogeneity. When deployed carefully, machine learning enables us to develop causal estimators that reflect an observational study design more closely than basic linear regression based methods.