McGill Statistics Seminar - McGill Statistics Seminars

- Mar 13, 2020
- post
Geometry-based Data Exploration

Wolf Guy · Mar 13, 2020
Date: 2020-03-13

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

High-throughput data collection technologies are becoming increasingly common in many fields, especially in biomedical applications involving single cell data (e.g., scRNA-seq and CyTOF). These introduce a rising need for exploratory analysis to reveal and understand hidden structure in the collected (high-dimensional) Big Data. A crucial aspect in such analysis is the separation of intrinsic data geometry from data distribution, as (a) the latter is typically biased by collection artifacts and data availability, and (b) rare subpopulations and sparse transitions between meta-stable states are often of great interest in biomedical data analysis. In this talk, I will show several tools that leverage manifold learning, graph signal processing, and harmonic analysis for biomedical (in particular, genomic/proteomic) data exploration, with emphasis on visualization, data generation/augmentation, and nonlinear feature extraction. A common thread in the presented tools is the construction of a data-driven diffusion geometry that both captures intrinsic structure in data and provides a generalization of Fourier harmonics on it. These, in turn, are used to process data features along the data geometry for denoising and generative purposes. Finally, I will relate this approach to the recently-proposed geometric scattering transform that generalizes Mallat’s scattering to non-Euclidean domains, and provides a mathematical framework for theoretical understanding of the emerging field of geometric deep learning.

Read More…
- Feb 21, 2020
- post
Non-central squared copulas: properties and applications

Bouchra Nasri · Feb 21, 2020
Date: 2020-02-21

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

The goal of this presentation is to introduce new families of multivariate copulas, extending the chi-square copulas, the Fisher copula, and squared copulas. The new families are constructed from existing copulas by first transforming their margins to standard Gaussian distributions, then transforming these variables into non-central chi-square variables with one degree of freedom, and finally by considering the copula associated with these new variables. It is shown that by varying the non-centrality parameters, one can model non-monotonic dependence, and when one or many non-centrality parameters are outside a given hyper-rectangle, then the copula is almost the same as the one when these parameters are infinite. For these new families, the tail behavior, the monotonicity of dependence measures such as Kendall’s tau and Spearman’s rho are investigated, and estimation is discussed. Some examples will illustrate the usefulness of these new copula families.

Read More…
- Feb 14, 2020
- post
Sharing Sustainable Mobility in Smart Cities

Wei Qi · Feb 14, 2020
Date: 2020-02-14

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

Many cities worldwide are embracing electric vehicle (EV) sharing as a flexible and sustainable means of urban transit. However, it remains challenging for the operators to charge the fleet due to limited or costly access to charging facilities. In this work, we focus on answering the core question - how to charge the fleet to make EV sharing viable and profitable. Our work is motivated by the recent setback that struck San Diego, California, where car2go ceased its EV sharing operations. We integrate charging infrastructure planning and vehicle repositioning operations that were often considered separately in the literature. More interestingly, our modeling emphasizes the operator-controlled charging operations and customers’ EV picking behavior, which are both central to EV sharing but were largely overlooked. Motivated by the actual data of car2go, our model explicitly characterizes how customers endogenously pick EVs based on energy levels, and how the operator dispatches EV charging under a targeted charging policy. We formulate the integrated model as a nonlinear optimization program with fractional constraints. We then develop both lower- and upper-bound formulations as mixed-integer second order cone programs, which are computationally tractable with small optimality gap. Contrary to car2go’s practice, we find that the viability of EV sharing can be enhanced by concentrating limited charger resources at selected locations. Charging EVs in a proactive fashion (rather than car2go’s policy of charging EVs only when their energy level drops below 20%) can boost the profit by 10.7%. Given the demand profile in San Diego, the fleet size may reduce by up to 34% without incurring significant profit loss. Moreover, sufficient charger availability is crucial when collaborating with a public charger network. Finally, increasing the charging power relieves the charger resource constraint, whereas extending per-charge range or adopting unmanned repositioning improves profitability. In summary, our work demonstrates a data-verified and high-granularity modeling approach. Both the high-level planning guidelines and operational policies can be useful for practitioners. We also highlight the value of jointly managing demand fulfilment and EV charging.

Read More…
- Jan 31, 2020
- post
Adapting black-box machine learning methods for causal inference

Victor Veitch · Jan 31, 2020
Date: 2020-01-31

Time: 15:30-16:30

Location: BURNSIDE 1104

Abstract:

I’ll discuss the use of observational data to estimate the causal effect of a treatment on an outcome. This task is complicated by the presence of “confounders” that influence both treatment and outcome, inducing observed associations that are not causal. Causal estimation is achieved by adjusting for this confounding by using observed covariate information. I’ll discuss the case where we observe covariates that carry sufficient information for the adjustment. But where explicit models relating treatment, outcome, covariates and confounding are not available.

Read More…
- Jan 13, 2020
- post
Estimation and inference for changepoint models

Sean Jewell · Jan 13, 2020
Date: 2020-01-13

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

This talk is motivated by statistical challenges that arise in the analysis of calcium imaging data, a new technology in neuroscience that makes it possible to record from huge numbers of neurons at single-neuron resolution. In the first part of this talk, I will consider the problem of estimating a neuron’s spike times from calcium imaging data. A simple and natural model suggests a non-convex optimization problem for this task. I will show that by recasting the non-convex problem as a changepoint detection problem, we can efficiently solve it for the global optimum using a clever dynamic programming strategy.

Read More…
- Nov 29, 2019
- post
Convergence rates for diffusions-based sampling and optimization methods

Murat A. Erdogdu · Nov 29, 2019
Date: 2019-11-29

Time: 15:30-16:30

Location: BURN 1205

Abstract:

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion. Central to our approach are new explicit Stein factor bounds on the solutions of Poisson equations. We complement these results with improved optimization guarantees for targets other than the standard Gibbs measure.

Read More…
- Nov 15, 2019
- post
Logarithmic divergence: from finance to optimal transport and information geometry

Ting-Kam Leonard Wong · Nov 15, 2019
Date: 2019-11-15

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Divergences, such as the Kullback-Leibler divergence, are distance-like quantities which arise in many applications in probability, statistics and data science. We introduce a family of logarithmic divergences which is a non-linear extension of the celebrated Bregman divergence. It is defined for any exponentially concave function (a function whose exponential is concave). We motivate this divergence by mathematical finance and large deviations of Dirichlet process. It also arises naturally from the solution to an optimal transport problem. The logarithmic divergence enjoys remarkable mathematical properties including a generalized Pythagorean theorem in the sense of information geometry, and induces a generalized exponential family of probability densities. In the last part of the talk we present a new differential geometric framework which connects optimal transport and information geometry. Joint works with Soumik Pal and Jiaowen Yang.

Read More…
- Nov 8, 2019
- post
Joint Robust Multiple Inference on Large Scale Multivariate Regression

Wen Zhou · Nov 8, 2019
Date: 2019-11-08

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Large scale multivariate regression with many heavy-tailed responses arises in a wide range of areas from genomics, financial asset pricing, banking regulation, to psychology and social studies. Simultaneously testing a large number of general linear hypotheses, such as multiple contrasts, based on the large scale multivariate regression reveals a variety of associations between responses and regression or experimental factors. Traditional multiple testing methods often ignore the effect of heavy-tailedness in the data and impose joint normality assumption that is arguably stringent in applications. This results in unreliable conclusions due to the lose of control on the false discovery proportion/rate (FDP/FDR) and severe compromise of power in practice. In this paper, we employ data-adaptive Huber regression to propose a framework of joint robust inference of the general linear hypotheses for large scale multivariate regression. With mild conditions, we show that the proposed method produces consistent estimate of the FDP and FDR at a prespecified level. Particularly, we employ a bias-correction robust covariance estimator and study its exponential-type deviation inequality to provide theoretical guarantee of our proposed multiple testing framework. Extensive numerical experiments demonstrate the gain in power of the proposed method compared to OLS and other procedures.

Read More…
- Oct 25, 2019
- post
Learning Connectivity Networks from High-Dimensional Point Processes

Ali Shojaie · Oct 25, 2019
Date: 2019-10-25

Time: 15:30-16:30

Location: BURN 1205

Abstract:

High-dimensional point processes have become ubiquitous in many scientific fields. For instance, neuroscientists use calcium florescent imaging to monitor the firing of thousands of neurons in live animals. In this talk, I will discuss new methodological, computational and theoretical developments for learning neuronal connectivity networks from high-dimensional point processes. Time permitting, I will also discuss a new approach for handling non-stationarity in high-dimensional time series.

Read More…
- Oct 18, 2019
- post
Univariate and multivariate extremes of extendible random vectors

Klaus Herrmann · Oct 18, 2019
Date: 2019-10-18

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In its most common form extreme value theory is concerned with the limiting distribution of location-scale transformed block-maxima $M_n = \max(X_1,\dots,X_n)$ of a sequence of identically distributed random variables $(X_i)$, $i\geq 1$. In case the members of the sequence $(X_i)$ are independent, the weak limiting behaviour of $M_n$ is adequately described by the classical Fisher-Tippett-Gnedenko theorem. In this presentation we are interested in the case of dependent random variables $(X_i)$ while retaining a common marginal distribution function $F$ for all $X_i$, $i\in\mathbb{N}$. Complementary to the well established extreme value theory in a time series setting we consider a framework in which the dependence between (extreme) events does not decay over time. This approach is facilitated by highlighting the connection between block-maxima and copula diagonals in an asymptotic context. The main goal of this presentation is to discuss a generalization of the Fisher–Tippett–Gnedenko theorem in this setting, leading to limiting distributions that are not in the class of generalized extreme value distributions. This result is exemplified for popular dependence structures related to extreme value, Archimedean and Archimax copulas. Focusing on the class of hierarchical Archimedean copulas the results can further be extended to the multivariate setting. Finally, we illustrate the resulting limit laws and discuss their properties.

Read More…

Date: 2020-03-13

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

Date: 2020-02-21

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

Date: 2020-02-14

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

Date: 2020-01-31

Time: 15:30-16:30

Location: BURNSIDE 1104

Abstract:

Date: 2020-01-13

Time: 15:30-16:30

Location: BURNSIDE 1205

Abstract:

Date: 2019-11-29

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2019-11-15

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2019-11-08

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2019-10-25

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2019-10-18

Time: 15:30-16:30

Location: BURN 1205

Abstract: