Indian companies and startups must realise that they could also build competitive AI models using limited resources and smart engineering.
DeepSeek burst onto the scene in early 2025 with a new model that sent shockwaves through Wall Street and tech giants like OpenAI and Nvidia. How could a startup from China trigger such a massive loss in US stock value? And what do these developments mean for the future of AI—especially for everyday people and countries like India?
DeepSeek, a Chinese AI startup based in Hangzhou, was founded by Liang Wenfeng, known for his work in quantitative trading. Originally a research lab under the hedge fund High-Flyer, DeepSeek focused on creating large language models (LLMs) capable of text understanding, maths solving, and reasoning, where the model explains how it reached a solution. When DeepSeek released its model, DeepSeek-R1, in January 2025, its chatbot app quickly became the top free app on the US Apple App Store. The startup's success unsettled investors as it built a competitive AI model for just US$5.6 million—a fraction of what US firms spent. This led to a sharp drop in tech stocks like Nvidia.
DeepSeek’s Journey and Innovations
Liang Wenfeng and his team had a stock of Nvidia GPUs from 2021, crucial when the US imposed export restrictions on advanced chips like the A100 in 2022. DeepSeek aimed to build efficient, open-source models with strong reasoning abilities. It developed several models, including DeepSeek-V2, DeepSeek-V3, and DeepSeek-R1.
DeepSeek-V3 employed a “mixture-of-experts (MoE)” approach, activating only necessary network parts for specific tasks, enhancing cost efficiency. DeepSeek-R1 added reinforcement learning, enabling chain-of-thought reasoning. While V3 provided quick answers, R1 explained its thought process, improving accuracy for complex tasks like maths problem-solving and coding.
Cost-Efficient Engineering
DeepSeek turned scarcity into strength. When U.S. export controls restricted advanced GPUs, DeepSeek adapted using MoE techniques, reducing training costs from hundreds of millions to just $5.6 million for DeepSeek-V3. According to its technical report, DeepSeek-V3 required only 2.788 million GPU hours on H800 chips, nearly 10 times less than what LLaMA 3.1 405B needed.
DeepSeek-R1 incorporated reinforcement learning for better reasoning. It used FP8 mixed precision training to balance efficiency and stability, reusing components from earlier models. Multi-Token Prediction (MTP) improved speed and efficiency by predicting two tokens sequentially instead of one.
Controversy: Did DeepSeek Use GPT’s Data?
There are claims that DeepSeek may have used ChatGPT-generated data instead of its own. Critics argue that querying ChatGPT and using its responses could breach OpenAI’s terms of service. Some suggest that DeepSeek sometimes identifies as "ChatGPT," possibly indicating training overlap. However, this could also result from ChatGPT-generated text being widely available online. There is no concrete proof, and the debate continues.
Security Concerns and Solutions
DeepSeek’s data storage in China raises concerns about potential access by Chinese authorities. One solution is using its open-source nature to host it outside China. Indian companies with sufficient GPU resources could run the model locally, ensuring data security. Smaller models fine-tuned for reasoning, like versions of Meta’s LLaMA or Microsoft’s Phi, could also run on personal computers, enhancing data privacy.
Opportunities for India
Indian companies and startups could build competitive models using limited resources and smart engineering. They could use DeepSeek’s architecture to create custom chatbots and AI tools and fine-tune open-source LLMs for Indian languages. Fine-tuning, combined with techniques like LoRA, could reduce training costs significantly, enhancing local AI development.
Lessons for Big Tech and Startups
Big tech companies could adopt open innovation to build transparent, cost-effective AI. Startups could use open-source models to develop competitive products without large investments. Governments could enhance innovation and data security by investing in public research and local AI hosting. Accessible AI would empower students, professionals, and hobbyists to innovate affordably and increase productivity.
DeepSeek’s approach demonstrates that advanced AI can be developed cost-effectively, setting new standards and influencing AI development across industries. Its open-source model promotes collaboration, allowing both large companies and smaller entities to advance AI technology and innovation.
(The author is the Co-founder and CTO of Gurgaon-based AI startup Bobble AI)
Fortune India is now on WhatsApp! Get the latest updates from the world of business and economy delivered straight to your phone. Subscribe now.