Publication

A Safe Harbor for AI Evaluation and Red Teaming

David Plunkert courtesy of Knight First Amendment Institute.

March 7, 2024

Topics

People

Shayne Longpre

Graduate Student

Projects

Data Provenance for AI

Groups

Share this publication

Artificial Intelligence (cs.AI), arXiv:2403.04893 [cs.AI] (or arXiv:2403.04893v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2403.04893 | Shayne Longpre* 1 Sayash Kapoor** 2 Kevin Klyman** 3 Ashwin Ramaswami 4 Rishi Bommasani 3 Borhane Blili-Hamelin 5 Yangsibo Huang 2 Aviya Skowron 6 Zheng-Xin Yong 7 Suhas Kotha 8 Yi Zeng 9 Weiyan Shi 10 Xianjun Yang 11 Reid Southen Alexander Robey

Abstract

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.

via Federation of American Scientists

A Safe Harbor for AI Evaluation and Red Teaming.pdf

A Safe Harbor for AI Evaluation and Red Teaming

Topics

People

Projects

Groups

Abstract

AI Training Can Undermine the Open Web. This Team Is Thinking Through Solutions

A large-scale audit of dataset licensing and attribution in AI

Data Provenance @ Mozilla Data Futures Lab

Study: Transparency is often lacking in datasets used to train large language models

A Safe Harbor for AI Evaluation and Red Teaming

Topics

People

Projects

Groups

Share this publication

Abstract

AI Training Can Undermine the Open Web. This Team Is Thinking Through Solutions

A large-scale audit of dataset licensing and attribution in AI

Data Provenance @ Mozilla Data Futures Lab

Study: Transparency is often lacking in datasets used to train large language models