A Data Science Central Community
Location: LinkedIn, 2025 Stierlin Ct, Mountain View, CA 94043
Date: Monday May 23, 2011; 6:30 pm 6:30 – 9:00 pm (6:30 – 7:00 networking & snacks; 7:00 – 7:10 announcements; 7:10+ presentation, Q&A)
Title: Bayesian Statistical Reasoning: an inferential, predictive and decision-making paradigm for the 21st centuryAbstract:
Cost: Free and open to all who wish to attend, but membership is only $20/year. Anyone may join our mailing list at no charge, and receive announcements of upcoming events.
Speaker: Professor David Draper, PhD
Professor Draper will give examples of Bayesian inference, prediction and decision-making in the context of several case studies from medicine and health policy. There will be points of potential technical interest for applied mathematicians, statisticians, and computer scientists.
Broadly speaking, statistics is the study of uncertainty: how to measure it well, and how to make good choices in the face of it. Statistical activities are of four main types: description of a data set, inference about the underlying process generating the data, prediction of future data, and decision-making under uncertainty. The last three of these activities are probability based. Two main probability paradigms are in current use: the frequentist (or relative-frequency) approach, in which you restrict attention to phenomena that are inherently repeatable under “identical” conditions and define P(A) to be the limiting relative frequency with which A would occur in hypothetical repetitions, as n goes to infinity; and the Bayesian approach, in which the arguments A and B of the probability operator P(A|B) are true-false propositions (with the truth status of A unknown to you and B assumed by you to be true), and P(A|B) represents the weight of evidence in favor of the truth of A, given the information in B.
The Bayesian approach includes the frequentest paradigm as a special case,so you might think it would be the only version of probability used in statistical work today, but
(a) in quantifying your uncertainty about something unknown to you, the Bayesian paradigm requires you to bring all relevant information to bear on the calculation; this involves combining information both internal and external to the data you’ve gathered, and (somewhat strangely) the external-information part of this approach was controversial in the 20th century, and
(b) Bayesian calculations require approximating high-dimensional integrals (whereas the frequentist approach mainly relies on maximization rather than integration), and this was a severe limitation to the Bayesian paradigm for a long time (from the 1750s to the 1980s).
The external-information problem has been solved by developing methods that separately handle the two main cases: (1) substantial external information, which is addressed by elicitation techniques, and (2) relatively little external information, which is covered by any of several methods for (in the jargon) specifying diffuse prior distributions. Good Bayesian work also involves sensitivity analysis: varying the manner in which you quantify the internal and external information across reasonable alternatives, and examining the stability of your conclusions.
Around 1990 two things happened roughly simultaneously that completely changed the Bayesian computational picture:
* Bayesian statisticians belatedly discovered that applied mathematicians (led by Metropolis), working at the intersection between chemistry and physics in the 1940s, had used Markov chains to develop a clever algorithm for approximating integrals arising in thermodynamics that are similar to the kinds of integrals that come up in Bayesian statistics, and
* desk-top computers finally became fast enough to implement the Metropolis algorithm in a feasibly short amount of time.
As a result of these developments, the Bayesian computational problem has been solved in a wide range of interesting application areas with small-to-moderate amounts of data; with large data sets, variational methods are available that offer a different approach to useful approximate solutions.
The Bayesian paradigm for uncertainty quantification does appear to have one remaining weakness, which coincides with a strength of the frequentest paradigm: nothing in the Bayesian approach to inference and prediction requires you to pay attention to how often you get the right answer (thisis a form of calibration of your uncertainty assessments), which is an activity that’s (i) central to good science and decision-making and (ii) natural to emphasize from the frequentist point of view. However, it has recently been shown that calibration can readily be brought into the Bayesian story by means of decision theory, turning the Bayesian paradigm into an approach that is (in principle) both logically internally consistent and well-calibrated.
In this talk I’ll (a) offer some historical notes about how we have arrived at the present situation and (b) give examples of Bayesian inference, prediction and decision-making in the context of several case studies from medicine and health policy. There will be points of potential technical interest for applied mathematicians, statisticians and computer scientists.
David Draper is a Professor of Statistics in the Department of Applied Mathematics and Statistics at the University of California, Santa Cruz. He is a Fellow of the American Association for the Advancement of Science, the American Statistical Association (ASA), the Institute of Mathematical Statistics (IMS), and the Royal Statistical Society (RSS); from 2001 to 2003 he served as the President-Elect, President, and Past President of the International Society for Bayesian Analysis (ISBA). He is the author or co-author of more than 100 contributions to the methodological and applied statistical literature, including articles in the Journal of the Royal Statistical Society (Series A, B and C), the Journal of the American Statistical Association, the Annals of Applied Statistics, Bayesian Analysis, Statistical Science, the New England Journal of Medicine, and the Journal of the American Medical Association; his 1995 JRSS-B article on assessment and propagation of model uncertainty has been cited more than 850 times. His research is in the areas of Bayesian inference and prediction, model uncertainty and empirical model-building, hierarchical modeling, Markov Chain Monte Carlo methods, and Bayesian nonparametric methods, with applications mainly in medicine, health policy, education, and environmental risk assessment. He has a particular interest in the exposition of complex statistical methods and ideas in the context of real-world applications.