OpenAI's Accidental Data Deletion Complicates NY Times Copyright Lawsuit

· 1 min read

article picture

In a recent development in the ongoing copyright lawsuit between The New York Times and OpenAI, potential evidence was accidentally deleted from a virtual machine used for investigating training data, according to court documents filed Wednesday.

The incident occurred on November 14, when OpenAI engineers erased programs and search result data from one of two virtual machines that had been provided to The Times and Daily News legal teams. These virtual machines were set up to allow the publishers to search for their copyrighted content within OpenAI's AI training datasets.

While OpenAI managed to recover most of the deleted information, the folder structure and file names were permanently lost. This has rendered the recovered data unusable for determining how the publishers' articles may have been used in building OpenAI's models.

The publishers' legal teams, who had spent over 150 hours searching through the training data since November 1, must now restart their investigation from scratch. In their court filing, the attorneys stated they don't believe the deletion was intentional but emphasized that OpenAI would be better positioned to search its own datasets for potentially infringing content.

The lawsuit centers on allegations that OpenAI used copyrighted content from The Times and Daily News without permission to train its AI models. OpenAI maintains that training models on publicly available data falls under fair use, though the company has recently secured licensing agreements with several major publishers including Associated Press and Axel Springer.

An OpenAI spokesperson declined to comment on the incident. The case continues in the U.S. District Court for the Southern District of New York, where it may set important precedents for AI training data usage and copyright law.