skip to content

How a Chinese start-up is changing how AI models are trained and outperforming OpenAI, Meta

Chinese start-up DeepSeek is making waves among AI developers worldwide with the release of its latest large language model (LLM), DeepSeek V3. Launched in December 2025, this model has been hailed as a game-changer for its remarkable efficiency in development and cost-effectiveness. The Hangzhou-based company has quickly become a standout player in the global AI community, showcasing innovative strategies to overcome resource constraints and geopolitical challenges.

With an impressive 671 billion parameters, DeepSeek’s model is comparable to some of the most advanced models globally. However, DeepSeek developed its model at a fraction of the cost of giants like Meta and OpenAI, requiring only $5.58 million and 2.78 million GPU hours, respectively. These figures contrast with Meta’s Llama 3.1, which needed 30.8 million GPU hours and more advanced hardware to train. DeepSeek’s success highlights the rapid advancements of Chinese AI firms, even under US semiconductor sanctions.

Revolutionary approach to LLM training

DeepSeek attributes its efficiency to a novel architecture designed for cost-effective training. By leveraging NVIDIA’s H800 GPUs, customized for the Chinese market, the company optimized its resources to achieve results that rival those of much larger players. This pragmatic approach underscores the potential of resource constraints to drive innovation, as noted by industry experts like NVIDIA’s Jim Fan and OpenAI’s Andrej Karpathy.

Fan commended DeepSeek for demonstrating how limited resources can lead to groundbreaking achievements in AI. Similarly, Jia Yangqing, founder of Lepton AI, praised the start-up’s ability to produce world-class outcomes through intelligent research and strategic investments. DeepSeek’s early acquisition of over 10,000 GPUs before US export restrictions laid the foundation for its success.

DeepSeek and controversies

DeepSeek has embraced open-source principles, making its models accessible to the global community. Its V1 model remains the most popular on Hugging Face, a leading platform for machine learning and open-source AI tools. This openness has pressured commercial AI developers to accelerate their innovations.

However, DeepSeek V3 has faced criticism for occasional identity confusion, mistakenly identifying itself as OpenAI’s ChatGPT during specific queries. Experts attribute this issue to “GPT contamination” in training data, a common problem across many AI models. While such errors are not unique to DeepSeek, they have sparked discussions about the challenges of ensuring model accuracy and identity integrity.

A new era for AI development

DeepSeek’s rise signals a shift in the AI landscape, demonstrating that innovative approaches can rival the dominance of tech giants. Despite geopolitical hurdles, the start-up’s achievements underscore the potential for Chinese AI firms to lead in the global market. With strong backing from High Flyer Quant and a team of young, capable developers, DeepSeek is poised to continue disrupting the field.

As the AI community watches closely, DeepSeek’s journey serves as a testament to the power of ingenuity and adaptability in shaping the future of artificial intelligence.

Share your love
Facebook
Twitter
LinkedIn
WhatsApp

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

error: Unauthorized Content Copy Is Not Allowed