Division of Biostatistics Seminar: Xiao-Li Meng (Harvard University)

September 7, 2018
12:30 pm - 1:30 pm
Erlanger Room, McDonnell Sciences 1st floor (Medical Campus)

“How Small are Our Big Data: Turning the 2016 Surprise into a 2020 Vision”


NOTE location

Abstract:    The term “Big Data” emphasizes data quantity, not quality.  However, much of the current measures of statistical uncertainties and errors are adequate only when the data are of perfect quality, that is, when they can be viewed as probabilistic samples.  We show that once we take into account the data quality, the effective sample size of a “Big Data” set can be vanishingly small.  Without understanding this phenomenon, “Big Data” can do more harm than good because of the drastically inflated precision assessment hence a gross overconfidence, setting us up to be caught by surprise when the reality unfolds, as we all experienced during the 2016 US presidential election. Data from Cooperative Congressional Election Study (CCES, conducted by Stephen Ansolabehere, Douglas River and others, and analyzed by Shiro Kuriwaki), are used to assess the data quality in 2016 US election polls, with the aim to gain a clearer vision for the 2020 election and beyond.  (This talk is based on  Meng (2018),  “Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election” Annals of Applied Statistics,  Number 2, 685-726.)

Division of Biostatistics seminars

For inquiries contact Chengjie Xiong.