Research Projects
bandicoot: A Python Toolbox for Mobile Phone Metadata
Yves-Alexandre de Montjoye, Luc Rocher, and Alex 'Sandy' Pentlandbandicoot provides a complete, easy-to-use environment for researchers using mobile phone metadata. It allows them to easily load their data, perform analysis, and export their results with a few lines of code. It computes 100+ standardized metrics in three categories: individual (number of calls, text response rate), spatial (radius of gyration, entropy of places), and social network (clustering coefficient, assortativity). The toolbox is easy to extend and contains extensive documentation with guides and examples.
Data-Pop Alliance
Alex 'Sandy' Pentland, Harvard Humanitarian Initiative and Overseas Development InstituteData-Pop Alliance is a joint initiative on big data and development with a goal of helping to craft and leverage the new ecosystem of big data--new personal data, new tools, new actors--to improve decisions and empower people in a way that avoids the pitfalls of a new digital divide, de-humanization, and de-democratization. Data-Pop Alliance aims to serve as a designer, broker, and implementer of ideas and activities, bringing together institutions and individuals around common principles and objectives through collaborative research, training and capacity building, technical assistance, convening, knowledge curation, and advocacy. Our thematic areas of focus include official statistics, socio-economic and demographic methods, conflict and crime, climate change and environment, literacy, and ethics.
DeepShop: Understanding Purchase Patterns via Deep Learning
Yoshihiko Suhara, Xiaowen Dong, Alex 'Sandy' PentlandThe recent availability of quantitative behavioral data provides an opportunity to study human behavior at unprecedented scale. Using large-scale financial transaction data, we propose a novel deep learning framework for understanding human purchase patterns and testing the link between them and the existence of individual financial troubles. Our work opens new possibilities in studying human behavioral traits using state-of-the-art machine learning techniques, without the need for hand-engineered features.
Enigma
Guy Zyskind, Oz Nathan and Alex 'Sandy' PentlandA peer-to-peer network, enabling different parties to jointly store and run computations on data while keeping the data completely private. Enigma's computational model is based on a highly optimized version of secure multi-party computation, guaranteed by a verifiable secret-sharing scheme. For storage, we use a modified distributed hashtable for holding secret-shared data. An external blockchain is utilized as the controller of the network, manages access control and identities, and serves as a tamper-proof log of events. Security deposits and fees incentivize operation, correctness, and fairness of the system. Similar to Bitcoin, Enigma removes the need for a trusted third party, enabling autonomous control of personal data. For the first time, users are able to share their data with cryptographic guarantees regarding their privacy.
Incentivizing Cooperation Using Social Pressure
Dhaval Adjodah, Erez Shmueli, David Shrier and Alex 'Sandy' PentlandCooperation in a large society of self-interested individuals is notoriously difficult to achieve when the externality of one individual's action is spread thin and wide. This leads to the "tragedy of the commons," with rational action ultimately leaving everyone worse off. Traditional policies to promote cooperation involve Pigouvian taxation or subsidies that make individuals internalize the externality they incur. We introduce a new approach to achieving global cooperation by localizing externalities to one's peers in a social network, thus leveraging the power of peer pressure to regulate behavior. The mechanism relies on a joint model of externalities and peer-pressure. Surprisingly, this mechanism can require a lower budget to operate than the Pigouvian mechanism, even when accounting for the social cost of peer pressure. Even when the available budget is very low, the social mechanisms achieve greater improvement in the outcome.
Leveraging Leadership Expertise More Effectively in Organizations
Alex 'Sandy' Pentland, Dhaval Adjodah and Alejandro Noriega CamperoWe believe that the narrative of only listening to experts or trusting the wisdom of the crowd blindly is flawed. Instead we have developed a system that weighs experts and lay-people differently and dynamically and show that a good balance is required. We show that our methodology leads to a 15 percent improvement in mean performance, 15 percent decrease in variance, and almost 30 percent increase in Sharpe-type ratio in a real online market.
Location Recommendations Based on Large-Scale Call Detail Records
Alex 'Sandy' Pentland, Yan Leng, Jinhua Zhao and Larry RudolphThe availability of large-scale longitudinal geolocation records offer planners and service providers an unprecedented opportunity to understand human behavior. Location recommendations based on these data sources can not only reduce information loads for travelers, but also increase revenues for service providers. Large-scale behavioral datasets transform the way planners and authorities create systematic-efficient interventions and provide customized information with the availability for a comprehensive picture. In this research, we aim to make recommendations by exploiting travelers' choice flexibilities. We infer implicit location preferences based on sparse and passively collected CDR. We then formulate an optimization model with the objective of maximizing overall satisfaction toward the recommendations with road capacity constraints. We are implementing the method in Andorra, a small European country heavily relying on tourism. We demonstrate that the method can reduce the travel time caused by congestion while making satisfactory location recommendations.
Mobile Territorial Lab
Alex 'Sandy' Pentland, Bruno Lepri and David ShrierThe Mobile Territorial Lab (MTL) aims at creating a “living” laboratory integrated in the real life of the Trento territory in Italy, open to manifold kinds of experimentations. In particular, the MTL is focused on exploiting the sensing capabilities of mobile phones to track and understand human behaviors (e.g., families' spending behaviors, lifestyles, mood, and stress patterns); on designing and testing social strategies aimed at empowering individual and collective lifestyles through attitude and behavior change; and on investigating new paradigms in personal data management and sharing. This project is a collaboration with Telecom Italia SKIL Lab, Foundation Bruno Kessler, and Telefonica I+D.
On the Reidentifiability of Credit Card Metadata
Yves-Alexandre de Montjoye, Laura Radaelli, Vivek Kumar Singh, Alex 'Sandy' PentlandEven when real names and other personal information are stripped from metadata datasets, it is often possible to use just a few pieces of information to identify a specific person. Here, we study three months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90 percent of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22 percent, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity, and that women are more reidentifiable than men in credit card metadata.
OPAL: Privacy-Conscientious Use of Mobile Phone Data
Nicolas de Cordes, Yves-Alexandre de Montjoye, Emmanuel Letouzé, Bill Hoffman, Alex "Sandy" PentlandOPAL is a project to allow for private data to be used in privacy-conscientious ways for good. Collaborating companies can use OPAL's open platform and algorithms behind their own firewalls to extract key development indicators. OPAL grew out of the recognition that accessing big data sources for research and policy purposes has been a conundrum. To date, data held by private companies, such as large-scale mobile phone data, have been accessed and analyzed externally, either through data challenges, or through bilateral agreements. While these types of engagements offered evidence of big data's promise and demand, these modalities limit the full realization of its potential. By "sending the code to the data" rather than the other way around, OPAL seeks to address these challenges and develop data services on the basis of greater trust between all parties involved.
Open Badges
Alex 'Sandy' Pentland, Oren Lederman and Akshay MohanWe present Open Badges, an open-source framework and toolkit for measuring and shaping face-to-face social interactions using either custom hardware devices or smart phones, and real-time web-based visualizations. Open Badges is a modular system that allows researchers to monitor and collect interaction data from people engaged in real-life social settings.
openPDS/ SaferAnswers: Protecting the Privacy of Metadata
Alex 'Sandy' Pentland, Brian Sweatt, Erez Shmueli, and Yves-Alexandre de MontjoyeIn a world where sensors, data storage, and processing power are too cheap to meter, how do you ensure that users can realize the full value of their data while protecting their privacy? openPDS is a field-tested, personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. SafeAnswers is a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata, instead of trying to anonymize individuals' metadata. Together, openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata.
Prediction Markets: Leveraging Internal Knowledge to Beat Industry Prediction Experts
Alex 'Sandy' Pentland, Dhaval Adjodah and Alejandro NoriegaMarkets are notorious for bubbles and bursts. Other research has found that crowds of lay-people can replace even leading experts to predict everything from product sales to the next big diplomatic event. In this project, we leverage both threads of research to see how prediction markets can be used to predict business and technological innovations, and use them as a model to fix financial bubbles. For example, a prediction market was rolled out inside of Intel and the experiment was very successful, and led to better predictions than the official Intel forecast 75 percent of the time. Prediction markets also led to as much as a 25 percent reduction in mean squared error over the prediction of official experts at Google, Ford, and Koch industries.
Recurrent Neural Network in Context-Free Next-Location Prediction
Alex 'Sandy' Pentland, Yan Leng, Jinhua Zhao and Larry RudolphLocation prediction is a critical building block in many location-based services and transportation management. This project explores the issue of next-location prediction based on the longitudinal movements of the locations individuals have visited, as observed from call detail decords (CDR). In a nutshell, we apply recurrent neural network (RNN) to next-location prediction on CDR. RNN can take in sequential input with no restriction on the dimensions of the input. The method can infer the hidden similarities among locations and interpret the semantic meanings of the locations. We compare the proposed method with Markov and a Naive Model proving that RNN has better accuracy in location prediction.
Sensible Organizations
Alex 'Sandy' Pentland, Benjamin Waber and Daniel Olguin OlguinSocial Bridges in Community Purchase Behavior
Xiaowen Dong, Yoshihiko Suhara, Vivek Singh, Alex 'Sandy' PentlandThe understanding and modeling of social influence on human economic behavior in city environments can have important implications. In this project, we study human purchase behavior at a community level and argue that people who live in different communities but work at similar locations could act as “social bridges” that link their respective communities and make the community purchase behavior similar through the possibility of social learning through face-to-face interactions.
The Privacy Bounds of Human Mobility
Cesar A. Hidalgo and Yves-Alexandre DeMontjoyeWe used 15 months of data from 1.5 million people to show that four points--approximate places and times--are enough to identify 95 percent of individuals in a mobility database. Our work shows that human behavior puts fundamental natural constraints on the privacy of individuals, and these constraints hold even when the resolution of the dataset is low. These results demonstrate that even coarse datasets provide little anonymity. We further developed a formula to estimate the uniqueness of human mobility traces. These findings have important implications for the design of frameworks and institutions dedicated to protecting the privacy of individuals.