A. Berke, D. Calacci, A. Pentland, and K. Larson. Evaluating Amazon Effects and the Limited Impact of COVID-19 With Purchases Crowdsourced from US Consumers. 2025. https://arxiv.org/abs/2501.10596
Work for a Member company and need a Member Portal account? Register here with your company email address.
Jan. 21, 2025
A. Berke, D. Calacci, A. Pentland, and K. Larson. Evaluating Amazon Effects and the Limited Impact of COVID-19 With Purchases Crowdsourced from US Consumers. 2025. https://arxiv.org/abs/2501.10596
We leverage a recently published dataset of Amazon purchase histories, crowdsourced from thousands of US consumers, to study how online purchasing behaviors have changed over time, how changes vary across demographic groups, the impact of the COVID-19 pandemic, and relationships between online and offline retail. This work provides a case study in how consumer-level purchases data can reveal purchasing behaviors and trends beyond those available from aggregate metrics. For example, in addition to analyzing spending behavior, we develop new metrics to quantify changes in consumers' online purchase frequency and the diversity of products purchased, to better reflect the growing ubiquity and dominance of online retail. Between 2018 and 2022 these consumer-level metrics grew on average by more than 85%, peaking in 2021. We find a steady upward trend in individuals' online purchasing prior to COVID-19, with a significant increase in the first year of COVID, but without a lasting effect. Purchasing behaviors in 2022 were no greater than the result of the pre-pandemic trend. We also find changes in purchasing significantly differ by demographics, with different responses to the pandemic. We further use the consumer-level data to show substitution effects between online and offline retail in sectors where Amazon heavily invested: books, shoes, and grocery. Prior to COVID we find year-to-year changes in the number of consumers making online purchases for books and shoes negatively correlated with changes in employment at local bookstores and shoe stores. During COVID we find online grocery purchasing negatively correlated with in-store grocery visits. This work demonstrates how crowdsourced, open purchases data can enable economic insights that may otherwise only be available to private firms.
Fig 1 provides a visual overview of the data and metrics we use to analyze changes in consumer purchasing behaviors, showing the following metrics computed from our sample data: total expenditure, the number of distinct products purchased and the number of orders made. These metrics are shown alongside Amazon net sales data (North America segment), reported for investor relations , and e-commerce retail sales data from the U.S. Census Bureau, which are both reported on a quarterly basis. We use these quarterly sales data to validate our metrics and then use our metrics to help reveal details the quarterly sales data lack, including changes within quarters and how changes are driven by different consumer groups. When comparing our sample's quarterly expenditure to the Amazon net sales and census e-commerce sales data, there is a Pearson correlation of r=0.976 (p<0.001) and r=0.982 (p<0.001), respectively. Fig 1 also shows how our sample's expenditure grew less quickly than Amazon sales in later quarters, which is expected given our sample is limited to a consistent set of users while Amazon's user population grew over time. Vertical blue lines in Fig 1 indicate months when Amazon's major sales event, Prime Day, occurred. We include these indicators throughout the results because Prime Day significantly increased the monthly metrics (see SI Table S5).
Fig 1. Online purchasing metrics. Quarterly Amazon net sales (North America segment) reported for investor relations and census e-commerce sales data, compared to metrics computed from our sample. Vertical blue lines indicate months Amazon Prime Day occurred. The orange line indicates March 2020, when COVID-19 had a major impact on US consumption. The sample metrics are scaled and shifted for legibility and should not be interpreted numerically.
Fig 2. Distribution of monthly metrics across users for Q1 of each year. Boxplots show the medians (lines), means (triangles), first and third quartiles, and whiskers indicate the 1.5x IQR. Outliers are omitted (see SI Tables S6-S8).
Fig 3. Graphical event study estimating change in purchase frequency over time. Solid lines display coefficients with 95% CIs. The dashed line displays the trend estimated over the pre-pandemic period (2018-01 to 2020-02). The orange section indicates the first year of COVID (2020-03 to 2021-02). Vertical blue lines indicate months Amazon Prime Day occurred.
COVID-19 had a limited impact on the trajectory of online purchasing
Fig 3 presents a graphical event study which we use to evaluate the impact of COVID on the upward trajectory of consumers' online purchase frequency. It displays the average change in purchase days per month, in a regression that controls for users' sex, age, and income demographics. Resulting coefficients are shown relative to a 2018-01 baseline. We also estimated the positive trend prior to COVID (dashed line in Fig 3) via a linear regression trained on the coefficients estimated for 2018-01 to 2020-02 (SI Table S10).
We do not find the COVID-19 pandemic had a lasting impact on the pre-pandemic trend. Instead, we find COVID provided a transient shock, significantly increasing online purchasing behavior above the trend line temporarily (𝛽=0.5578; p<0.001). Purchasing then returned to a level no higher than the pre-pandemic trend would have brought it to. Numerical results are detailed in the SI (Table S11).
For robustness, we repeat this event study using the number of distinct products purchased each month as the dependent variable, instead of purchase days. The results are similar, showing a temporary boost due to COVID, where the metrics then resolve to the pre-pandemic trend line (see SI Fig S4).
Purchasing behaviors differ by demographic groups
Fig 4 shows results from the regressions analyzing relationships between demographics and purchasing (Eq 4), where statistically significant values (p<0.05) are outlined in black.
Fig 4. Regression results showing relationships between purchasing and demographics. Coefficients report estimated relative impact of consumer demographics on (left) purchase frequency for 2018 and 2022, and (right) change in purchase frequency from 2018 to 2022, and from one year prior to COVID to the first year of COVID. Bars indicating statistically significant values (p<0.05) are outlined in black.