/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • Graph Representation Learning and Applications

    Date: 2019-04-26

    Time: 15:30-16:30

    Location: BURNSIDE 1205

    Abstract:

    Graphs, a general type of data structures for capturing interconnected objects, are ubiquitous in a variety of disciplines and domains ranging from computational social science, recommender systems, medicine, bioinformatics to chemistry. Representative examples of real-world graphs include social networks, user-item networks, protein-protein interaction networks, and molecular structures, which are represented as graphs. In this talk, I will introduce our work on learning effective representations of graphs such as learning low-dimensional node representations of large graphs (e.g., social networks, protein-protein interaction graphs, and knowledge graphs) and learning representations of entire graphs (e.g., molecule structures).

  • Estimating Time-Varying Causal Excursion Effect in Mobile Health with Binary Outcomes

    Date: 2019-04-12

    Time: 15:30-16:30

    Location: BURNSIDE 1205

    Abstract:

    Advances in wearables and digital technology now make it possible to deliver behavioral, mobile health, interventions to individuals in their every-day life. The micro-randomized trial (MRT) is increasingly used to provide data to inform the construction of these interventions. This work is motivated by multiple MRTs that have been conducted or are currently in the field in which the primary outcome is a longitudinal binary outcome. The first, often called the primary, analysis in these trials is a marginal analysis that seeks to answer whether the data indicates that a particular intervention component has an effect on the longitudinal binary outcome. Under rather restrictive assumptions one can, based on existing literature, derive a semi-parametric, locally efficient estimator of the causal effect. In this talk, starting from this estimator, we develop multiple estimators that can be used as the basis of a primary analysis under more plausible assumptions. Simulation studies are conducted to compare the estimators. We illustrate the developed methods using data from the MRT, BariFit. In BariFit, the goal is to support weight maintenance for individuals who received bariatric surgery.

  • Bayesian Estimation of Individualized Treatment-Response Curves in Populations with Heterogeneous Treatment Effects

    Date: 2019-04-05

    Time: 15:30-16:30

    Location: BURNSIDE 1104

    Abstract:

    Estimating individual treatment effects is crucial for individualized or precision medicine. In reality, however, there is no way to obtain both the treated and untreated outcomes from the same person at the same time. An approximation can be obtained from randomized controlled trials (RCTs). Despite the limitations that randomizations are usually expensive, impractical or unethical, pre-specified variables may still not fully incorporate all the relevant characteristics capturing individual heterogeneity in treatment response. In this work, we use non-experimental data; we model heterogenous treatment effects in the studied population and provide a Bayesian estimator of the individual treatment response. More specifically, we develop a novel Bayesian nonparametric (BNP) method that leverages the G-computation formula to adjust for time-varying confounding in observational data, and it flexibly models sequential data to provide posterior inference over the treatment response at both group level and individual level. On a challenging dataset containing time series from patients admitted to intensive care unit (ICU), our approach reveals that these patients have heterogenous responses to the treatments used in managing kidney function. We also show that on held out data the resulting predicted outcome in response to treatment (or no treatment) is more accurate than alternative approaches.

  • Introduction to Statistical Network Analysis

    Date: 2019-03-29

    Time: 13:00-16:30

    Location: McIntyre – Room 521

    Abstract:

    Classical statistics often makes assumptions about conditional independence in order to fit models but in the modern world connectivity is key. Nowadays we need to account for many dependencies and sometimes the associations and dependencies themselves are the key items of interest e.g. how do we predict conflict between countries, how can we use friendships between school children to choose the best groups for study tips/help, how does the pattern of needle-sharing among partners correlate to HIV transmission and where interventions can best be made. Basically any type of study where we are interested in connections or associations between pairs of actors, be they people, companies, countries or anything else, we are looking at a network analysis. The methods falling under this area are collectively known as “Statistical Network Analysis” or sometimes “Social Network Analysis” (which can be a bit misleading as we are not only talking about Facebook and the like). This workshop will give a general introduction to networks, their visualisation, summary measures and statistical models that can be used to analyse them. The practical component will be in R and attendees will get the most benefit if they are able to bring a laptop along to work through examples.

  • Challenges in Bayesian Computing

    Date: 2019-03-22

    Time: 15:30-16:30

    Location: BURN 1104

    Abstract:

    Computing is both the most mathematical and most applied aspect of statistics. We shall talk about various urgent computing-related topics in statistical (in particular, Bayesian) workflow, including exploratory data analysis and model checking, Hamiltonian Monte Carlo, monitoring convergence of iterative simulations, scalable computing, evaluation of approximate algorithms, predictive model evaluation, and simulation-based calibration. This work is inspired by applications including survey research, drug development, and environmental decision making.

  • Hierarchical Bayesian Modelling for Wireless Cellular Networks

    Date: 2019-03-15

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    With the recent advances in wireless technologies, base stations are becoming more sophisticated. The network operators are also able to collect more data to improve network performance and user experience. In this paper we concentrate on modeling performance of wireless cells using hierarchical Bayesian modeling framework. This framework provides a principled way to navigate the space between the option of creating one model to represent all cells in a network and the option of creating separate models at each cell. The former option ignores the variations between cells (complete pooling) whereas the latter is overly noisy and ignores the common patterns in cells (no pooling). The hierarchical Bayesian model strikes a trade-off between these two extreme cases and enables us to do partial pooling of the data from all cells. This is done by estimating a parametric population distribution and assuming that each cell is a sample from this distribution. Because this model is fully Bayesian, it provides uncertainty intervals around each estimated parameter which can be used by network operators making network management decisions. We examine the performance of this method on a synthetic dataset and a real dataset collected from a cellular network.

  • Statistical Inference for partially observed branching processes, with application to hematopoietic lineage tracking

    Date: 2019-03-01

    Time: 15:30-16:30

    Location: BURN 1104

    Abstract:

    The likelihood function is central to many statistical procedures, but poses challenges in classical and modern data settings. Motivated by cell lineage tracking experiments to study hematopoiesis (the process of blood cell production), we present recent methodology enabling likelihood-based inference for partially observed data arising from continuous-time branching processes. These computational advances allow principled procedures such as maximum likelihood estimation, posterior inference, and expectation-maximization (EM) algorithms in previously intractable data settings. We then discuss limitations and alternatives when data are very large or generated from a hidden process, and potential ways forward using ideas from sparse optimization.

  • Uniform, nonparametric, non-asymptotic confidence sequences

    Date: 2019-02-22

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    A confidence sequence is a sequence of confidence intervals that is uniformly valid over an unbounded time horizon. In this paper, we develop non-asymptotic confidence sequences under nonparametric conditions that achieve arbitrary precision. Our technique draws a connection between the classical Cramer-Chernoff method, the law of the iterated logarithm (LIL), and the sequential probability ratio test (SPRT)—our confidence sequences extend the first to produce time-uniform concentration bounds, provide tight non-asymptotic characterizations of the second, and generalize the third to nonparametric settings, including sub-Gaussian and Bernstein conditions, self-normalized processes, and matrix martingales. We strengthen and generalize existing constructions of finite-time iterated logarithm (“finite LIL”) bounds. We illustrate the generality of our proof techniques by deriving an empirical-Bernstein finite LIL bound as well as a novel upper LIL bound for the maximum eigenvalue of a sum of random matrices. Finally, we demonstrate the utility of our approach with applications to covariance matrix estimation and to estimation of sample average treatment effect under the Neyman-Rubin potential outcomes model, for which we give a non-asymptotic, sequential estimation strategy which handles adaptive treatment mechanisms such as Efron’s biased coin design.

  • Causal Inference with Unmeasured Confounding: an Instrumental Variable Approach

    Date: 2019-02-15

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Causal inference is a challenging problem because causation cannot be established from observational data alone. Researchers typically rely on additional sources of information to infer causation from association. Such information may come from powerful designs such as randomization, or background knowledge such as information on all confounders. However, perfect designs or background knowledge required for establishing causality may not always be available in practice. In this talk, I use novel causal identification results to show that the instrumental variable approach can be used to combine the power of design and background knowledge to draw causal conclusions. I also introduce novel estimation tools to construct estimators that are robust, efficient and enjoy good finite sample properties. These methods will be discussed in the context of a randomized encouragement design for a flu vaccine.

  • Patient-Specific Finite Element Analysis of Human Heart: Mathematical and Statistical Opportunities and Challenges

    Date: 2019-02-08

    Time: 15:30-16:30

    Location: BURN 1104

    Abstract:

    Cardiovascular diseases (CVD) are the leading cause of death globally and ranks second in Canada, costing the Canadian economy over $20 billion every year. Despite the recent progress in CVD through prevention, lifestyle changes, and the use of biomedical treatments to improve survival rates and quality of life, there has been a lack in the integration of computer-aided engineering (CAE) in this field. Clinically, proposing cut-off values while taking into consideration patient-specific risk is of paramount importance for increased rate ofsurvival and improved quality of life. Computational modeling has proved to be used in determining parameters that cannot be assessed experimentally. The latest developments in computational modelling of human heart are presented and the constitutive equations, the key ingredient of these in-silico modellings of human heart, are discussed. Finite Element analysis of cardiac diseases provide a framework to generate synthetic data for developing statistical models when collecting the real data require invasive procedure. The idea of virtual personalized cardiology will be discussed.