• Login
  • Register

Work for a Member company and need a Member Portal account? Register here with your company email address.


Crowdsourcing purchase histories

Alex Berke

We crowdsourced and published a first-of-its-kind open dataset that contains Amazon purchase histories, spanning 2018 to 2022, from more than 5000 US consumers, along with their sociodemographics.


Our paper "Open e-commerce 1.0, five years of crowdsourced U.S. Amazon purchase histories with user demographics" published via Nature Scientific Data describes the dataset, how to access it, and how it might be used to supplement government census data.

Publication of this dataset was awarded an MIT Prize for Open Data honorable mention.

Crowdsourcing experiment

We also built an experiment into the data crowdsourcing process. We did this to research which factors impact people's likelihood to share their data for open research, to help inform future crowdsourcing efforts. Results from the experiment are published via CSCW in our paper "Insights from an experiment crowdsourcing data from thousands of US Amazon users: The importance of transparency, money, and data use".


Crowdsourcing was possible because Amazon.com platform users had access to their own data. To help empower future crowdsourcing, in December 2022 we published an FTC comment regarding their upcoming commercial surveillance rulemaking

In our comment we "encourage the FTC to ensure that its commercial surveillance and data security rulemaking facilitates and empowers consumers to share their data, with their informed consent, with researchers and consumer advocacy organizations, and prohibits corporate practices designed to prevent this... to help surface hidden harms, furthering the FTC’s mission".


In our paper "Evaluating Amazon Effects and the Limited Impact of COVID-19 With Purchases Crowdsourced from US Consumers", we leverage the dataset to study how online purchasing behaviors have changed over time, how changes vary across demographic groups, the impact of the COVID-19 pandemic, and relationships between online and offline retail. This work provides a case study in how consumer-level purchases data can reveal purchasing behaviors and trends beyond those available from aggregate metrics.

In our paper "Measuring risks inherent to our digital economies using Amazon purchase histories from US consumers", we use the dataset to show the ease with which consumers' personal attributes can be inferred from their purchases. This includes demographics as well as sensitive health information, such as diabetes status. We also measure and highlight how different product categories contribute to inference risk in order to make our findings more interpretable and actionable for future researchers and privacy advocates.

Project at a glance