/categories/mcgill-statistics-seminar/index.xml McGill Statistics Seminar - McGill Statistics Seminars
  • The multidimensional edge: Seeking hidden risks

    Date: 2012-11-09

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Assessing tail risks using the asymptotic models provided by multivariate extreme value theory has the danger that when asymptotic independence is present (as with the Gaussian copula model), the asymptotic model provides estimates of probabilities of joint tail regions that are zero. In diverse applications such as finance, telecommunications, insurance and environmental science, it may be difficult to believe in the absence of risk contagion. This problem can be partly ameliorated by using hidden regular variation which assumes a lower order asymptotic behavior on a subcone of the state space and this theory can be made more flexible by extensions in the following directions: (i) higher dimensions than two; (ii) where the lower order variation on a subcone is of extreme value type different from regular variation; and (iii) where the concept is extended to searching for lower order behavior on the complement of the support of the limit measure of regular variation. We discuss some challenges and potential applications to this ongoing effort.

  • Multivariate extremal dependence: Estimation with bias correction

    Date: 2012-11-02

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Estimating extreme risks in a multivariate framework is highly connected with the estimation of the extremal dependence structure. This structure can be described via the stable tail dependence function L, for which several estimators have been introduced. Asymptotic normality is available for empirical estimates of L, with rate of convergence k^1/2, where k denotes the number of high order statistics used in the estimation. Choosing a higher k might be interesting for an improved accuracy of the estimation, but may lead to an increased asymptotic bias. We provide a bias correction procedure for the estimation of L. Combining estimators of L is done in such a way that the asymptotic bias term disappears. The new estimator of L is shown to allow more flexibility in the choice of k. Its asymptotic behavior is examined, and a simulation study is provided to assess its small sample behavior. This is a joint work with Cécile Mercadier (Université Lyon 1) and Laurens de Haan (Erasmus University Rotterdam).

  • Simulation model calibration and prediction using outputs from multi-fidelity simulators

    Date: 2012-10-26

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Computer simulators are used widely to describe physical processes in lieu of physical observations. In some cases, more than one computer code can be used to explore the same physical system - each with different degrees of fidelity. In this work, we combine field observations and model runs from deterministic multi-fidelity computer simulators to build a predictive model for the real process. The resulting model can be used to perform sensitivity analysis for the system and make predictions with associated measures of uncertainty. Our approach is Bayesian and will be illustrated through a simple example, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan.

  • Modeling operational risk using a Bayesian approach to EVT

    Date: 2012-10-12

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Extreme Value Theory has been widely used for assessing risk for highly unusual events, either by using block maxima or peaks over the threshold (POT) methods. However, one of the main drawbacks of the POT method is the choice of a threshold, which plays an important role in the estimation since the parameter estimates strongly depend on this value. Bayesian inference is an alternative to handle these difficulties; the threshold can be treated as another parameter in the estimation, avoiding the classical empirical approach. In addition, it is possible to incorporate internal and external observations in combination with expert opinion, providing a natural, probabilistic framework in which to evaluate risk models. In this talk, we analyze operational risk data using a mixture model which combines a parametric form for the center and a GPD for the tail of the distribution, using all observations for inference about the unknown parameters from both distributions, the threshold included. A Bayesian analysis is performed and inference is carried out through Markov Chain Monte Carlo (MCMC) methods in order to determine the minimum capital requirement for operational risk.

  • Markov switching regular vine copulas

    Date: 2012-10-05

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    Using only bivariate copulas as building blocks, regular vines(R-vines) constitute a flexible class of high-dimensional dependence models. In this talk we introduce a Markov switching R-vine copula model, combining the flexibility of general R-vine copulas with the possibility for dependence structures to change over time. Frequentist as well as Bayesian parameter estimation is discussed. Further, we apply the newly proposed model to examine the dependence of exchange rates as well as stock and stock index returns. We show that changes in dependence are usually closely interrelated with periods of market stress. In such times the Value at Risk of an asset portfolio is significantly underestimated when changes in the dependence structure are ignored.

  • The current state of Q-learning for personalized medicine

    Date: 2012-09-28

    Time: 14:30-15:30

    Location: BURN 1205

    Abstract:

    In this talk, I will provide an introduction to DTRs and an overview the state of the art (and science) of Q-learning, a popular tool in reinforcement learning. The use of Q-learning and its variance in randomized and non-randomized studies will be discussed, as well as issues concerning inference as the resulting estimators are not always regular. Current and future directions of interest will also be considered.

  • Hypothesis testing in finite mixture models: from the likelihood ratio test to EM-test

    Date: 2012-04-05

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    In the presence of heterogeneity, a mixture model is most natural to characterize the random behavior of the samples taken from such populations. Such strategy has been widely employed in applications ranging from genetics, information technology, marketing, to finance. Studying the mixing structure behind a random sample from the population allows us to infer the degree of heterogeneity with important implications in applications such as the presence of disease subgroups in genetics. The statistical problem is to test the hypotheses on the order of the finite mixture models. There has been continued interest in the limiting behavior of the likelihood ratio tests. The non-regularity of the finite mixture models has provided statisticians ample examples of unusual limiting distributions. Yet many of such results are not convenient for conducting hypothesis tests. Motivated at overcoming such difficulties, we have developed a number of strategies to obtain tests with high efficiency yet easy to use limiting distributions. The latest development is a class of EM-tests which are advantageous in many respects. Their limiting distributions are easier to derive mathematically, simple for implementation in data analysis and valid for more general class of mixture models without restrictions on the space of the mixing distribution. The simulation indicates the limiting distributions have good precision at approximating the finite sample distributions in the examples investigated.

  • A matching-based approach to assessing the surrogate value of a biomarker

    Date: 2012-03-30

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Statisticians have developed a number of frameworks which can be used to assess the surrogate value of a biomarker, i.e. establish whether treatment effects on a biological quantity measured shortly after administration of treatment predict treatment effects on the clinical endpoint of interest. The most commonly applied of these frameworks is due to Prentice (1989), who proposed a set of criteria which a surrogate marker should satisfy. However, verifying these criteria using observed data can be challenging due to the presence of unmeasured simultaneous predictors (i.e. confounders) which influence both the potential surrogate and the outcome. In this work, we adapt a technique proposed by Rosenbaum (2002) for observational studies, in which observations are matched and the odds of treatment within each matched pair is bounded. This yields a straightforward and interpretable sensitivity analysis which can be performed particularly efficiently for certain types of test statistics. In this talk, I will introduce the surrogate endpoint problem, discuss the details of my proposed technique for assessing surrogate value, and illustrate with some simulated examples inspired by the problem of identifying immune surrogates in HIV vaccine trials.

  • Model selection principles in misspecified models

    Date: 2012-03-23

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Model selection is of fundamental importance to high-dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Bayesian principle and the Kullback-Leibler divergence principle, which lead to the Bayesian information criterion and Akaike information criterion, respectively, when models are correctly specified. Yet model misspecification is unavoidable in practice. We derive novel asymptotic expansions of the two well-known principles in misspecified generalized linear models, which give the generalized BIC (GBIC) and generalized AIC. A specific form of prior probabilities motivated by the Kullback-Leibler divergence principle leads to the generalized BIC with prior probability ($\mbox{GBIC}_p$), which can be naturally decomposed as the sum of the negative maximum quasi-log-likelihood, and a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the new methods for model selection in both correctly specified and misspecified models.

  • Variable selection in longitudinal data with a change-point

    Date: 2012-03-16

    Time: 15:30-16:30

    Location: BURN 1205

    Abstract:

    Follow-up studies are frequently carried out to investigate the evolution of measurements through time, taken on a set of subjects. These measurements (responses) are bound to be influenced by subject specific covariates and if a regression model is used the data analyst is faced with the problem of selecting those covariates that “best explain” the data. For example, in a clinical trial, subjects may be monitored for a response following the administration of a treatment with a view of selecting the covariates that are best predictive of a treatment response. This variable selection setting is standard. However, more realistically, there will often be an unknown delay from the administration of a treatment before it has a measurable effect. This delay will not be directly observable since it is a property of the distribution of responses rather than of any particular trajectory of responses. Briefly, each subject will have an unobservable change-point. With a change-point component added, the variable selection problem necessitates the use of penalized likelihood methods. This is because the number of putative covariates for the responses, as well as the change-point distribution, could be large relative to the follow-up time and/or the number of subjects; variable selection in a change-point setting does not appear to have been studied in the literature. In this talk I will briefly introduce the multi-path change-point problem. I will show how variable selection for the covariates before the change, after the change, as well as for the change-point distribution, reduces to variable selection for a finite mixture of multivariate distributions. I will discuss the performance of my model selection methods using an example on cognitive decline in subjects with Alzheimer’s disease and through simulations.