Past Seminar Series - McGill Statistics Seminars

- Oct 17, 2014
- post
Patient privacy, big data, and specimen pooling: Using an old tool for new challenges

Paramita S. Chaudhuri · Oct 17, 2014
Date: 2014-10-17

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In the recent past, electronic health records and distributed data networks emerged as a viable resource for medical and scientific research. As the use of confidential patient information from such sources become more common, maintaining privacy of patients is of utmost importance. For a binary disease outcome of interest, we show that the techniques of specimen pooling could be applied for analysis of large and/or distributed data while respecting patient privacy. I will review the pooled analysis for a binary outcome and then show how it can be used for distributed data. Aggregate level data are passed from the nodes of the network to the analysis center and can be used very easily with logistic regression for estimation of disease odds ratio associated with a set of categorical or continuous covariates. Pooling approach allows for consistent estimation of the parameters of logistic regression that can include confounders. Additionally, since the individual covariate values can be accessed within a network, effect modifiers can be accommodated and consistently estimated. Since pooling effectively reduces the size of the dataset by creating pools or sets of individual, the resulting dataset can be analyzed much more quickly as compared to an original dataset that is too big as compared to computing environment.

Read More…
- Oct 10, 2014
- post
A margin-free clustering algorithm appropriate for dependent maxima in the domain of attraction of an extreme-value copula

Eric Cormier · Oct 10, 2014
Date: 2014-10-10

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Extracting relevant information in complex spatial-temporal data sets is of paramount importance in statistical climatology. This is especially true when identifying spatial dependencies between quantitative extremes like heavy rainfall. The paper of Bernard et al. (2013) develops a fast and simple clustering algorithm for finding spatial patterns appropriate for extremes. They develop their algorithm by adapting multivariate extreme-value theory to the context of spatial clustering. This is done by relating the variogram, a well-known distance used in geostatistics, to the extremal coefficient of a pair of joint maxima. This gives rise to a straightforward nonparametric estimator of this distance using the empirical distribution function. Their clustering approach is used to analyze weekly maxima of hourly precipitation recorded in France and a spatial pattern consistent with existing weather models arises. This applied talk is devoted to the validation and extension of this clustering approach. A simulation study using the multivariate logistic distribution as well as max-stable random fields shows that this approach provides accurate clustering when the maxima belong to an extreme-value distribution. Furthermore this clustering distance can be viewed as an average absolute rank difference, implying that it is appropriate for margin-free clustering of dependent variables. In particular it is appropriate for dependent maxima in the domain of attraction of an extreme-value copula.

Read More…
- Oct 3, 2014
- post
Statistical exploratory data analysis in the modern era

Susan R. Wilson · Oct 3, 2014
Date: 2014-10-03

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Major challenges arising from today’s “data deluge” include how to handle the commonly occurring situation of different types of variables (say, continuous and categorical) being simultaneously measured, as well as how to assess the accompanying flood of questions. Based on information theory, a bias-corrected mutual information (BCMI) measure of association that is valid and estimable between all basic types of variables has been proposed. It has the advantage of being able to identify non-linear as well as linear relationships. Based on the BCMI measure, a novel exploratory approach to finding associations in data sets having a large number of variables of different types has been developed. These associations can be used as a basis for downstream analyses such as finding clusters and networks. The application of this exploratory approach is very general. Comparisons also will be made with other measures. Illustrative examples include exploring relationships (i) in clinical and genomic (say, gene expression and genotypic) data, and (ii) between social, economic, health and political indicators from the World Health Organisation.

Read More…
- Sep 26, 2014
- post
Analysis of palliative care studies with joint models for quality-of-life measures and survival

Tor Tosteson · Sep 26, 2014
Date: 2014-09-26

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In palliative care studies, the primary outcomes are often health related quality of life measures (HRLQ). Randomized trials and prospective cohorts typically recruit patients with advanced stage of disease and follow them until death or end of the study. An important feature of such studies is that, by design, some patients, but not all, are likely to die during the course of the study. This affects the interpretation of the conventional analysis of palliative care trials and suggests the need for specialized methods of analysis. We have developed a “terminal decline model” for palliative care trials that, by jointly modeling the time until death and the HRQL measures, leads to flexible interpretation and efficient analysis of the trial data (Li, Tosteson, Bakitas, STMED 2012).

Read More…
- Sep 19, 2014
- post
Covariates missing by design

Michael McIsaac · Sep 19, 2014
Date: 2014-09-19

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Incomplete data can arise in many different situations for many different reasons. Sometimes the data may be incomplete for reasons beyond the control of the experimenter. However, it is also possible that this missingness is part of the study design. By using a two-phase sampling approach where only a small sub-sample gives complete information, it is possible to greatly reduce the cost of a study and still obtain precise estimates. This talk will introduce the concepts of incomplete data and two-phase sampling designs and will discuss adaptive two-phase designs which exploit information from an internal pilot study to approximate the optimal sampling scheme for an analysis based on mean score estimating equations.

Read More…
- Sep 12, 2014
- post
Hydrological applications with the functional data analysis framework

Fateh Chebana · Sep 12, 2014
Date: 2014-09-12

Time: 15:30-16:30

Location: BURN 1205

Abstract:

River flows records are an essential data source for a variety of hydrological applications including the prevention of flood risks and as well as the planning and management of water resources. A hydrograph is a graphical representation of the temporal variation of flow over a period of time (continuously measured, usually over a year). A flood hydrograph is commonly characterized by a number of features, mainly its peak, volume and duration. Classical and recent multivariate approaches considered in hydrological applications treated these features jointly in order to take into account their dependence structure or their relationship. However, all these approaches are based on the analysis of a limited number of characteristics and do not make use of the full information provided by the hydrograph. Even though these approaches provided good results, they present some drawbacks and limitations. The objective of the present talk is to introduce a new framework for hydrological applications where data, such as hydrographs, are employed as continuous curves: functional data. In this context, the whole hydrograph is considered as one infinite-dimensional observation. This context contributes to addressing the problem of lack of data commonly encountered in hydrology. A number of functional data analysis tools and methods are presented and adapted.

Read More…
- Apr 11, 2014
- post
Adaptive piecewise polynomial estimation via trend filtering

Ryan Tibshirani · Apr 11, 2014
Date: 2014-04-11

Time: 15:30-16:30

Location: Salle KPMG, 1er étage HEC Montréal

Abstract:

We will discuss trend filtering, a recently proposed tool of Kim et al. (2009) for nonparametric regression. The trend filtering estimate is defined as the minimizer of a penalized least squares criterion, in which the penalty term sums the absolute kth order discrete derivatives over the input points. Perhaps not surprisingly, trend filtering estimates appear to have the structure of kth degree spline functions, with adaptively chosen knot points (we say “appear” here as trend filtering estimates are not really functions over continuous domains, and are only defined over the discrete set of inputs). This brings to mind comparisons to other nonparametric regression tools that also produce adaptive splines; in particular, we will compare trend filtering to smoothing splines, which penalize the sum of squared derivatives across input points, and to locally adaptive regression splines (Mammen & van de Geer 1997), which penalize the total variation of the kth derivative.

Read More…
- Apr 4, 2014
- post
Some aspects of data analysis under confidentiality protection

Bimal Sinha · Apr 4, 2014
Date: 2014-04-04

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Statisticians working in most federal agencies are often faced with two conflicting objectives: (1) collect and publish useful datasets for designing public policies and building scientific theories, and (2) protect confidentiality of data respondents which is essential to uphold public trust, leading to better response rates and data accuracy. In this talk I will provide a survey of two statistical methods currently used at the U.S. Census Bureau: synthetic data and noise perturbed data.

Read More…
- Mar 28, 2014
- post
How much does the dependence structure matter?

Ruodu Wang · Mar 28, 2014
Date: 2014-03-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

In this talk, we will look at some classical problems from an anti-traditional perspective. We will consider two problems regarding a sequence of random variables with a given common marginal distribution. First, we will introduce the notion of extreme negative dependence (END), a new benchmark for negative dependence, which is comparable to comonotonicity and independence. Second, we will study the compatibility of the marginal distribution and the limiting distribution when the dependence structure in the sequence is allowed to vary among all possibilities. The results are somewhat simple, yet surprising. We will provide some interpretation and applications of the theoretical results in financial risk management, with the hope to deliver the following message: with the common marginal distribution known and dependence structure unknown, we know essentially nothing about the asymptotic shape of the sum of random variables.

Read More…
- Mar 21, 2014
- post
Insurance company operations and dependence modeling

Jed Frees · Mar 21, 2014
Date: 2014-03-21

Time: 15:30-16:30

Location: BURN 107

Abstract:

Actuaries and other analysts have long had the responsibility in insurance company operations for various financial functions including (i) ratemaking, the process of setting premiums, (ii) loss reserving, the process of predicting obligations that arise from policies, and (iii) claims management, including fraud detection. With the advent of modern computing capabilities and detailed and novel data sources, new opportunities to make an impact on insurance company operations are extensive.

Read More…

Date: 2014-10-17

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-10-10

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-10-03

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-09-26

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-09-19

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-09-12

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-04-11

Time: 15:30-16:30

Location: Salle KPMG, 1er étage HEC Montréal

Abstract:

Date: 2014-04-04

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-03-28

Time: 15:30-16:30

Location: BURN 1205

Abstract:

Date: 2014-03-21

Time: 15:30-16:30

Location: BURN 107

Abstract: