OpenAI Accuses Chinese Rival DeepSeek of Data Theft in AI Development Battle

· 1 min read

article picture

In a twist of irony that has the tech world buzzing, OpenAI and Microsoft are investigating whether Chinese AI startup DeepSeek improperly trained its latest language model using OpenAI's data outputs - the same type of data collection practice OpenAI itself has been criticized for.

According to recent reports from Bloomberg and the Financial Times, the investigation centers on DeepSeek's R1 model, which has matched or exceeded OpenAI's capabilities while using significantly fewer resources. OpenAI suspects DeepSeek may have violated its terms of service through unauthorized data collection.

David Sacks, a venture capitalist and Trump administration member, claims there is "substantial evidence" that DeepSeek used an AI technique called distillation to extract knowledge from OpenAI's models. This process involves one AI model learning from another by asking millions of questions to mimic its reasoning.

The accusations come at an awkward time for OpenAI, which faces its own lawsuit from The New York Times for allegedly training on copyrighted articles without permission. OpenAI has defended such practices as "fair use," arguing that training AI models on publicly available internet materials is both legal and necessary for innovation.

DeepSeek appears to have achieved superior results through advanced reinforcement learning rather than simply accumulating massive datasets. This challenges OpenAI's previous claims that "unprecedented scale" of training data was essential for developing sophisticated AI models.

The situation highlights growing tensions in AI development between established players and newcomers, as well as questions about data rights and competitive practices. While OpenAI maintains it must "protect its IP," critics point out the company's seemingly contradictory stance on unauthorized data usage depending on whether they're the collector or target.

Sam Altman, OpenAI's CEO, appeared to criticize DeepSeek in December, suggesting it was "relatively easy to copy something that works" compared to pioneering new approaches. However, industry observers note that OpenAI's own work builds upon research from Google and academia, following standard scientific practice.

As this dispute unfolds, it underscores broader debates about data ownership, AI development ethics, and the increasing competition between U.S. and Chinese tech companies in the race for AI supremacy.