One of the pending tasks of science is to strengthen public participation in the dialogue with the scientific community. One proposed solution has been to improve the environment in which this communication takes place. The City Science research group at the MIT Media Lab, a department of the Massachusetts Institute of Technology (MIT), seeks to improve productivity in collaborative meetings between scientists, citizens and governments with new approaches to the environment of such dialogues. The use of immersive spaces to foster the urban design process is one of the main research strands of the City Science Lab and ImmerScope City is its flagship project.
MetaCity is a text-to-KPI system that seeks to extract urban indicators from the conversations that take place in these collaborative scientific communication environments, which aims to be the semantic bridge between scientists and communities. Such a system has a large language model, trained by OpenAI, and complemented with additional information and labeled data from experts from the City Science Lab.
The preprocessing and subsequent analysis of the labeled data has revealed several aspects about online surveys as a data collection methodology. Conducting a first and second survey has revealed that 18.8% of the answers given in the second survey were contradictory to those given in the first survey, when the same person was asked the same question. The proportions of contradictions differed according to the indicator or description being labeled, where up to one third of contradictions were found for one indicator and almost one third of contradictions were found for one of the descriptions, the latter measure being significantly different from the rest of the descriptions. Some experts were also found to contradict themselves more than others, with one expert contradicting himself significantly less than the rest of the expert panel, contradicting himself less than half as much as the average of the rest.
MetaCity has been evaluated with the GPT-4o, GPT-4 and GPT-3.5-turbo models, with different formats of prompt. The effectiveness of few-shot learning has been shown in all models, tripling the sensitivity of the model itself with respect to zero-shot learning. The best performing of the three models was GPT-4 with few-shot, with a sensitivity of 71.4% and a specificity of 76.0%. Meanwhile, the GPT-4o model, its light and optimized version, reveals close results in sensitivity (71.4%) and identical in specificity (76.0%).
Additionally, the adverse effect of poor quality external information has been observed, reducing sensitivity in all models, although increasing specificity to global maximums close to 100%. In all models, a bias towards negative predictions revealed in message formats better in sensitivity and worse in specificity has been highlighted, despite the proportion of the ground truth data, where 60.7% values are positive. A strong positive correlation between the proportions of positive and negative responses of actual and predicted values has been observed in the GPT-4o and GPT-4 models, and the effects of data shrinkage were seen in the results of both models by 59.8% and 46.1%, respectively.
In conclusion, the findings of this study highlight the importance of developing advanced tools such as MetaCity to improve the interaction between scientists, citizens and governments in the context of urban design. The implementation of large language models, complemented with expert-labeled data, demonstrates significant potential for extracting key urban performance indicators from collaborative dialogues. However, challenges have also been identified, such as inconsistencies in survey responses and the negative impact of low-quality information, which need to be addressed to optimize the accuracy and effectiveness of the system. As MetaCity is refined and its models adjusted, this tool could become an essential bridge to bridge the gap between the scientific community and the public, encouraging more informed and productive participation in sustainable urban design.