OpenAI has raised serious concerns about Chinese AI startup DeepSeek, suspecting the company is using its data to train its models. DeepSeek has gained significant attention for its cost-effective AI solutions and is considered a strong competitor to OpenAI’s offerings. Following this, OpenAI and its partner Microsoft are investigating whether DeepSeek used OpenAI’s API to integrate its models into their systems.
According to sources cited by Bloomberg, Microsoft’s security researchers discovered significant amounts of data being exfiltrated from OpenAI developer accounts in late 2024, which they believe are linked to DeepSeek.
OpenAI has claimed it found evidence suggesting that DeepSeek used distillation, a technique that extracts data from larger models to train smaller ones. This method is efficient, but OpenAI argues that using it to create competing models violates its terms of service.
The distillation technique: A common practice or IP theft?
Distillation is a well-known technique in AI development, allowing smaller models to replicate the performance of more powerful ones at a fraction of the cost. OpenAI’s GPT-4 model, which cost over $100 million to train, is an example of a large and complex AI system.
However, OpenAI claims that DeepSeek has used its models to train its system through distillation, which it argues violates its terms of service. The company has not disclosed specifics of the evidence it gathered but is confident that DeepSeek has used its data without permission.
Oh, the irony…
The situation takes on an ironic tone, as OpenAI itself made substantial advancements by scraping data from the internet without explicit consent, a practice that has sparked criticism in the past. This has led to some questioning the ethics of how AI companies gather and use data.
Despite OpenAI’s history of data scraping, the company is now taking action to protect its intellectual property, especially as it faces competition from companies like DeepSeek that seem to be rapidly catching up with its capabilities.
Growing concerns over IP theft
The allegations against DeepSeek have sparked reactions from various figures in the tech world. David Sacks, former AI czar under President Trump, speculated that DeepSeek’s actions might constitute intellectual property theft, noting that “substantial evidence” of distillation is being used to extract knowledge from OpenAI’s models.
OpenAI has responded by asserting that Chinese companies, among others, frequently attempt to reverse-engineer leading US AI models. To protect its intellectual property, OpenAI stated that it is working closely with the US government and taking countermeasures to safeguard its technology from being exploited by competitors.
As investigations continue, the incident has brought the debate around intellectual property rights in AI development to the forefront, highlighting the growing tensions between global competitors and the ethical implications of using proprietary data to create rival models.