November 21, 2024

OpenAI Data Deletion Complicates New York Times Copyright Lawsuit

Listen to this article as Podcast
0:00 / 0:00
OpenAI Data Deletion Complicates New York Times Copyright Lawsuit

Data Breach at OpenAI Raises Questions in Copyright Dispute with The New York Times

There are new developments in the ongoing legal dispute between The New York Times and OpenAI, as well as Microsoft. The New York Times claims that OpenAI accidentally deleted data that the newspaper's team had painstakingly collected over 150 hours as potential evidence. This data was part of the investigation in a copyright infringement lawsuit.

Background of the Legal Dispute

The New York Times filed a lawsuit against OpenAI and Microsoft last year. The accusation: The companies unlawfully used the newspaper's articles to train AI tools like ChatGPT. This case is just one of many ongoing legal disputes between AI companies and publishers. The Daily News has also filed a similar lawsuit, which is being handled by the same lawyers.

The Role of the Training Data

As part of the ongoing proceedings, the case is in the so-called discovery phase. In this phase, both sides must disclose requested documents and information that could serve as evidence. OpenAI was ordered by the court to grant The New York Times access to its training data. A significant step, as OpenAI has not yet publicly disclosed exactly what information was used to build its AI models. For the disclosure, OpenAI set up a so-called "sandbox" with two virtual machines that The New York Times' lawyers could search.

The Data Deletion Incident

In an affidavit, Jennifer B. Maisel, a lawyer for The New York Times, stated that OpenAI engineers had "deleted" data on one of these machines that had been organized by the newspaper's team. OpenAI acknowledged the deletion and attempted to rectify the problem. However, according to the newspaper's lawyers, the recovered data was too disorganized to be used effectively. Original file names and the folder structure were missing. This makes it impossible to determine where the copied New York Times articles might have flowed into OpenAI's AI models.

Controversy Surrounding the Data Recovery

The New York Times' lawyers emphasized that they have no reason to believe that the deletion was intentional. In the submitted emails, OpenAI's lawyer, Tom Gorman, referred to the data loss as a "mistake." However, the newspaper's lawyers also expressed their frustration over the incident and the additional effort it has caused. They argue that OpenAI is better equipped to search its own datasets.

Further Points of Contention in the Proceedings

The dispute over the training data is not the only point of conflict in this lawsuit. There have already been disagreements about which party should be responsible for reviewing the data. The New York Times is also demanding the release of Slack messages, text messages, and social media conversations between key OpenAI employees. Microsoft, on the other hand, has requested The New York Times to disclose documents relating to its own use of generative AI.

Outlook and Significance of the Case

While this and similar cases make their way through the courts, OpenAI is seeking licensing agreements with other publishers. There is no consensus on how these cases will ultimately be decided. However, they will set an important precedent for how the AI industry can operate in the United States. The dispute raises fundamental questions about data handling in the age of artificial intelligence and underscores the need for clear regulations to protect intellectual property.