McGill Statistics Seminar - McGill Statistics Seminars

- Feb 5, 2021
- post
An Adaptive Algorithm to Multi-armed Bandit Problem with High-dimensional Covariates

Wei Qian · Feb 5, 2021
Date: 2021-02-05

Time: 15:30-16:30 (Montreal time)

Zoom Link

Meeting ID: 843 0865 5572

Passcode: 690084

Abstract:

This work studies an important sequential decision making problem known as the multi-armed bandit problem with covariates. Under a linear bandit framework with high-dimensional covariates, we propose a general arm allocation algorithm that integrates both arm elimination and randomized assignment strategies. By employing a class of high-dimensional regression methods for coefficient estimation, the proposed algorithm is shown to have near optimal finite-time regret performance under a new study scope that requires neither a margin condition nor a reward gap condition for competitive arms. Based on synergistically verified benefit of the margin, our algorithm exhibits an adaptive performance that automatically adapts to the margin and gap conditions, and attains the optimal regret rates under both study scopes, without or with the margin, up to a logarithmic factor. The proposed algorithm also simultaneously generates useful coefficient estimation output for competitive arms and is shown to achieve both estimation consistency and variable selection consistency. Promising empirical performance is demonstrated through two real data evaluation examples in drug dose assignment and news article recommendation.

Read More…
- Jan 15, 2021
- post
Large-scale Machine Learning Algorithms for Biomedical Data Science

Wei Zhang · Jan 15, 2021
Date: 2021-01-15

Time: 15:30-16:30 (Montreal time)

Zoom Link

Meeting ID: 843 0865 5572

Passcode: 690084

Abstract:

During the last decade, hundreds of machine learning methods have been developed for disease outcome prediction based on high-throughput genomics data. However, the quality of the input genomics features and the output clinical variables has been ignored in these algorithms. In this talk, I will introduce two studies that develop methods to learn more accurate molecular signatures and drug response values for cancer research. These studies are supported by NSF, NIH, and Moffitt Cancer Center.

Read More…
- Dec 4, 2020
- post
Quasi-random sampling for multivariate distributions via generative neural networks

Marius Hofert · Dec 4, 2020
Date: 2020-12-04

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

A novel approach based on generative neural networks is introduced for constructing quasi-random number generators for multivariate models with any underlying copula in order to estimate expectations with variance reduction. So far, quasi-random number generators for multivariate distributions required a careful design, exploiting specific properties (such as conditional distributions) of the implied copula or the underlying quasi-Monte Carlo point set, and were only tractable for a small number of models. Utilizing specific generative neural networks allows one to construct quasi-random number generators for a much larger variety of multivariate distributions without such restrictions. Once trained with a pseudo-random sample, these neural networks only require a multivariate standard uniform randomized quasi-Monte Carlo point set as input and are thus fast in estimating expectations under dependence with variance reduction. Reproducible numerical examples are considered to demonstrate the approach. Emphasis is put on ideas rather than mathematical proofs.

Read More…
- Nov 27, 2020
- post
Probabilistic Approaches to Machine Learning on Tensor Data

Qing Mai · Nov 27, 2020
Date: 2020-11-27

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

In contemporary scientific research, it is often of great interest to predict a categorical response based on a high-dimensional tensor (i.e. multi-dimensional array). Motivated by applications in science and engineering, we propose two probabilistic methods for machine learning on tensor data in the supervised and the unsupervised context, respectively. For supervised problems, we develop a comprehensive discriminant analysis model, called the CATCH model. The CATCH model integrates the information from the tensor and additional covariates to predict the categorical outcome with high accuracy. We further consider unsupervised problems, where no categorical response is available even on the training data. A doubly-enhanced EM (DEEM) algorithm is proposed for model-based tensor clustering, in which both the E-step and the M-step are carefully tailored for tensor data. CATCH and DEEM are developed under explicit statistical models with clear interpretations. They aggressively take advantage of the tensor structure and sparsity to tackle the new computational and statistical challenges arising from the intimidating tensor dimensions. Efficient algorithms are developed to solve the related optimization problems. Under mild conditions, CATCH and DEEM are shown to be consistent even when the dimension of each mode grows at an exponential rate of the sample size. Numerical studies also strongly support the application of CATCH and DEEM.

Read More…
- Nov 20, 2020
- post
Modeling viral rebound trajectories after analytical antiretroviral treatment interruption

Rui Wang · Nov 20, 2020
Date: 2020-11-20

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Despite the success of combined antiretroviral therapy (ART) in achieving sustained control of viral replication, the concerns about side-effects, drug-drug interactions, drug resistance and cost call for a need to identify strategies for achieving HIV eradication or an ART-free remission. Following ART withdrawal, patients’ viral load levels usually increase rapidly to a peak followed by a dip, and then stabilize at a viral load set point. Characterizing features of the viral rebound trajectories (e.g., time to viral rebound and viral set points) and identifying host, virological, and immunological factors that are predictive of these features requires addressing analytical challenges such as non-linear viral rebound trajectories, coarsened data due to the assay’s limit of quantification, and intermittent measurements of viral load values. We first introduce a parametric nonlinear mixed effects (NLME) model for the viral rebound trajectory and compare its performance to a mechanistic modeling approach. We then develop a smoothed simulated pseudo maximum likelihood method for fitting NLME models that permits flexible specification of random effects distributions. Finally, we investigate the association between the time to viral suppression after ART initiation and the time to viral rebound after ART interruption through a Cox proportional hazard regression model where both the outcome and the covariate are interval-censored observations.

Read More…
- Nov 6, 2020
- post
Generalized Energy-Based Models

Arthur Gretton · Nov 6, 2020
Date: 2020-11-06

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

I will introduce Generalized Energy Based Models (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the “generator”). In particular, while the energy function is analogous to the GAN critic function, it is not discarded after training. GEBMs are trained by alternating between learning the energy and the base, much like a GAN. Both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base. Samples from the posterior on the latent space of the trained model can be obtained via MCMC, thus finding regions in this space that produce better quality samples. Empirically, the GEBM samples on image-generation tasks are of better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. GEBMs also return state-of-the-art performance on density modelling tasks, and when using base measures with an explicit form.

Read More…
- Oct 30, 2020
- post
Test-based integrative analysis of randomized trial and real-world data for treatment heterogeneity estimation

Shu Yang · Oct 30, 2020
Date: 2020-10-30

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Parallel randomized clinical trial (RCT) and real-world data (RWD) are becoming increasingly available for treatment evaluation. Given the complementary features of the RCT and RWD, we propose a test-based integrative analysis of the RCT and RWD for accurate and robust estimation of the heterogeneity of treatment effect (HTE), which lies at the heart of precision medicine. When the RWD are not subject to bias, e.g., due to unmeasured confounding, our approach combines the RCT and RWD for optimal estimation by exploiting semiparametric efficiency theory. Utilizing the design advantage of RTs, we construct a built-in test procedure to gauge the reliability of the RWD and decide whether or not to use RWD in an integrative analysis. We characterize the asymptotic distribution of the test-based integrative estimator under local alternatives, which provides a better approximation of the finite-sample behaviors of the test and estimator when the idealistic assumption required for the RWD is weakly violated. We provide a data-adaptive procedure to select the threshold of the test statistic that promises the smallest mean square error of the proposed estimator of the HTE. Lastly, we construct an adaptive confidence interval that has a good finite-sample coverage property. We apply the proposed method to characterize who can benefit from adjuvant chemotherapy in patients with stage IB non-small cell lung cancer.

Read More…
- Oct 23, 2020
- post
Linear Regression and its Inference on Noisy Network-linked Data

Tianxi Li · Oct 23, 2020
Date: 2020-10-23

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Linear regression on a set of observations linked by a network has been an essential tool in modeling the relationship between response and covariates with additional network data. Despite its wide range of applications in many areas, such as social sciences and health-related research, the problem has not been well-studied in statistics so far. Previous methods either lack of inference tools or rely on restrictive assumptions on social effects, and usually treat the network structure as precisely observed, which is too good to be true in many problems. We propose a linear regression model with nonparametric social effects. Our model does not assume the relational data or network structure to be accurately observed; thus, our method can be provably robust to a certain level of perturbation of the network structure. We establish a full set of computationally efficient asymptotic inference tools under a general requirement of the perturbation and then study the robustness of our method in the specific setting when the perturbation is from random network models. We discover a phase-transition phenomenon of inference validity concerning the network density when no prior knowledge about the network model is available, while also show the significant improvement achieved by knowing the network model. A by-product of our analysis is a rate-optimal concentration bound about subspace projection that may be of independent interest. We conduct extensive simulation studies to verify our theoretical observations and demonstrate the advantage of our method compared to a few benchmarks under different data-generating models. The method is then applied to adolescent network data to study the gender and racial differences in social activities.

Read More…
- Oct 16, 2020
- post
Adaptive MCMC For Everyone

Jeffrey Rosenthal · Oct 16, 2020
Date: 2020-10-16

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Markov chain Monte Carlo (MCMC) algorithms, such as the Metropolis Algorithm and the Gibbs Sampler, are an extremely useful and popular method of approximately sampling from complicated probability distributions. Adaptive MCMC attempts to automatically modify the algorithm while it runs, to improve its performance on the fly. However, such adaptation often destroys the ergodicity properties necessary for the algorithm to be valid. In this talk, we first illustrate MCMC algorithms using simple graphical examples. We then discuss adaptive MCMC, and present examples and theorems concerning its ergodicity and efficiency.

Read More…
- Oct 9, 2020
- post
Machine Learning and Neural Networks: Foundations and Some Fundamental Questions

Masoud Asgharian, Damoon Robatian and Zedian Xiao · Oct 9, 2020
Date: 2020-10-09

Time: 15:30-16:30

Zoom Link

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Statistical learning theory is by now a mature branch of data science that hosts a vast variety of practical techniques for tackling data-related problems. In this talk we present some fundamental concepts upon which statistical learning theory has been based. Different approaches to statistical inference will be discussed and the main problem of learning from Vapnik’s point of view will be explained. Further we discuss the topic of function estimation as the heart of Vapnik-Chervonenkis theory. There exist several state-of-the-art methods for estimating functional dependencies, such as maximum margin estimator and artificial neural networks. While for some of these methods, e.g., the support vector machines, there has already been developed a profound theory, others require more investigation. Accordingly, we pay a closer attention to the so-called mapping neural networks and try to shed some light on certain theoretical aspects of them. We highlight some of the fundamental challenges that have attracted the attention of researcher and they are yet to be fully resolved. One of these challenges is estimation of the intrinsic dimension of data that will be discussed in detail. Another challenge is inferring causal direction when the training data set is not representative of the target population.

Read More…

Date: 2021-02-05

Time: 15:30-16:30 (Montreal time)

Meeting ID: 843 0865 5572

Passcode: 690084

Abstract:

Date: 2021-01-15

Time: 15:30-16:30 (Montreal time)

Meeting ID: 843 0865 5572

Passcode: 690084

Abstract:

Date: 2020-12-04

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-11-27

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-11-20

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-11-06

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-10-30

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-10-23

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-10-16

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract:

Date: 2020-10-09

Time: 15:30-16:30

Meeting ID: 924 5390 4989

Passcode: 690084

Abstract: