“Veracity in Big Data Analysis: A Statistical Approach to Analyzing Crowdsourced Mobile Data”
Hosted by the Division of Biostatistics
Abstract: `Big data’ is ubiquitous in many areas of applications. While abundance of data
provides opportunities for better inference, it also comes with unique challenges,
including the issues of volume and veracity of the data that make the task of
information extraction very difficult. In this talk, we will consider a crowd-sourced
data set collected through an App on mobile devices that gives low quality, high
volume, unstructured, noisy measurements on several variables including the
ambient temperature. We introduce a way to associate veracity scores to a data
value that says how reliable and useful the data value is. We then use this
measure of veracity to develop statistical methods for the low quality data and
show that it provides significant improvements over naïve applications of the
standard prediction methodology. We also explore ways of augmenting
information available sources using the crowd-sourced data.
*Joint work with Arnab Chakraborty, Alyson Wilson
Everyone must register for each individual seminar. Each presentation has a unique registration link found on the Biostatistics seminars webpage.
For inquiries contact Emily Gremminger.