Dissertation Title: Data futures: Transforming digital traces into public goods in the age of commercial surveillance
Abstract:
For decades, government agencies have collected data via surveys to produce datasets and statistics that serve as public goods, enabling research and empowering communities from whom data are collected. These data sources are costly to collect and are in decline as survey response rates drop. In contrast, increasing quantities of data are collected from the public by companies -- data we unavoidably generate by making purchases, using the internet, or simply operating a mobile phone. This data collection might be considered a form of surveying the public, but where privatized datasets empower corporations rather than communities, and the ensuing potential harms cannot be empirically assessed without access to these data.
This thesis considers a future where corporations can more accurately track populations and estimate statistics than the government agencies traditionally tasked with such efforts. This thesis illustrates how this future may be nearby, and raises resulting questions which are explored through case studies. Namely, are there more privacy-preserving or equitable or cooperative ways to manage these data to benefit the public from whom they are sourced?
The first set of case studies use location data from mobile phones, first developing a more privacy-preserving approach that leverages recurrent neural networks to generate realistic synthetic data, and second developing aggregated mobility metrics to improve country level population estimates and COVID-19 epidemic models. The next set of case studies use web browser data to expose risks of cross-site user tracking that are present despite privacy-preserving browser developments which remove tracking cookies. The first web study leverages browsing data collected by a data broker; the second uses a dataset we crowdsourced and openly published to benefit this research and future research. For the next set of case studies, we crowdsourced and published a first-of-its-kind open dataset consisting of purchase histories from thousands of Amazon.com users, along with their sociodemographics. We use this dataset to demonstrate how corporate data can provide insights into societal changes and also evaluate privacy risks due to inferring sensitive consumer information from purchases. We also use the data to explore cooperative risk mitigation strategies.
A distinct commonality across the types of data at the focus of this thesis (mobile device locations, web browsing data, purchase histories) is that these are digital traces collected continuously from people throughout everyday activities, without explicit consent. This work points towards cooperative data sharing as a paradigm to empower research that benefits the public while prioritizing consent. Could such a paradigm exist with public support and participation? In order to study this and inform future crowdsourcing efforts, we embedded behavior experiments and surveys into our crowdsourcing tools, shedding light on what impacts users' likelihood to share their data, how users believe their data should be used, and how results differ across demographic groups.
Throughout these studies, this thesis asks a broader question: Can we envision, and build towards, a future with alternative data economies that shift existing power dynamics of data collection, along with the control and benefits of these data? To begin to address this question, this thesis proposes speculative, privacy-preserving, and cooperative commerce networks. Such system changes may incur new costs for consumers. The final case study measures consumers' willingness to pay for privacy in new package delivery networks.
Committee members:
Kent Larson
MIT Professor of the Practice, Media Arts and Sciences
Massachusetts Institute of Technology
Alex (Sandy) Pentland
Toshiba Professor of Media Arts and Sciences
Massachusetts Institute of Technology
Latanya Sweeney
Daniel Paul Professor of the Practice of Government and Technology
Harvard College and Harvard Kennedy School