China’s DeepSeek makes a comeback with V4 model; here’s all you need to know

April 24, 2026, 12:15 IST/3 min read

DeepSeek’s V4 series targets million-token contexts, advanced reasoning and agentic workflows, challenging OpenAI and Anthropic with ultra-long memory and aggressive pricing.

China’s DeepSeek makes a comeback with V4 model; here’s all you need to know

Set Fortune India ✓As Preferred on Google

China's DeepSeek has unveiled a preview of its V4 series, positioning it as a step forward in long-context processing, reasoning performance, and agent-driven workflows. The release includes two Mixture-of-Experts (MoE) models—DeepSeek-V4-Pro and DeepSeek-V4-Flash —both built to handle a context length of up to 1 million tokens. The release comes right after OpenAI launched its latest GPT 5.5 model, which the company calls its “smartest and most intuitive to use model yet”.

Enjoy uninterrupted access to premium content and insights.

Go Ad-Free

“Through architectural innovations, DeepSeek-V4 series achieve a dramatic leap in computational efficiency for processing ultra-long sequences. This breakthrough enables efficient support for a context length of one million tokens, ushering in a new era of million-length contexts for next-generation LLMs,” the company said, adding that architectural upgrades are aimed at improving efficiency in ultra-long sequence processing.

At the core is a shift in how the model handles scale. “Through architectural innovations, the DeepSeek-V4 series achieves a dramatic leap in computational efficiency for processing ultra-long sequences,” the company noted. The goal, it added, is to “break the efficiency barrier in ultra-long contexts” and enable “deeper research into long-horizon tasks.”

A push towards million-token context

The headline capability is the 1M token context window, now standard across both models. In practical terms, this allows the model to ingest and retain far larger inputs without losing coherence.

DeepSeek attributes this to structural changes in attention mechanisms, including token-wise compression and its DeepSeek Sparse Attention (DSA) approach. These reduce compute and memory overhead while maintaining performance over long sequences. The company claims this delivers “world-leading long context with drastically reduced compute & memory costs.”

Model split: Pro vs Flash

The two models target different use cases. The DeepSeek-V4-Pro model functions on 1.6 trillion parameters (49B activated), and is positioned as the flagship model. As per its report on the Hugging Face, the Pro version “leads all current open models” in world knowledge, trailing only Google’s Gemini-3.1-Pro. The company claims top-tier performance on math, STEM, and coding benchmarks, which are designed for complex reasoning and agentic workflows.

Meanwhile, the DeepSeek-V4-Flash model, which is smaller, faster, and significantly cheaper, functions on 284 billion parameters (13B activated). Its reasoning capabilities closely approach V4-Pro, the company states. The smaller model performs comparably on simpler agent tasks and is optimised for latency-sensitive and cost-sensitive deployments.

Pricing signals aggressive positioning

The API pricing is what makes DeepSeek’s cost strategy for per million tokens stand out. For the V4-Pro, input (cache hit) is $0.145 and input (cache miss) is $1.74, while output is $3.48. For the V4-Flash, input (cache hit) is $0.028, input (cache miss) is $0.14, and output is $0.28. A cache hit means repeated or previously processed tokens, priced much lower, and a cache miss refers to fresh input, which requires full computation. This structure incentivises applications that reuse context—common in agent loops and long sessions.

This aggressive price compression places DeepSeek’s pricing lower than even OpenAI’s mini models. OpenAI’s flagship models are priced at around $2.50–$5 per million input tokens and $15–$30 for output, with lower-cost variants like GPT-4.1 at roughly $2 input and $8 output per million tokens. Anthropic’s Claude family sits higher, with Claude Sonnet models at about $3 input and $15 output, and flagship Claude Opus models at $5 input and $25 output per million tokens.

Agentic capabilities move to the centre

DeepSeek is explicitly positioning V4 for agent-based systems. “DeepSeek-V4 is seamlessly integrated with leading AI agents like Claude Code, OpenClaw and OpenCode,” the company said, adding that the models are “already driving our in-house agentic coding,” the company stated.

China’s DeepSeek makes a comeback with V4 model; here’s all you need to know

Top Videos

Nalini Parthiban on Building Sweet Karam Coffee | Fortune India: The Disruptors

Prof. Gourav Vallabh on India’s Economic Playbook in a Time of Global Turmoil

HP India Market’s Ipsita Dasgupta Decodes The Future of Personal Computing

A push towards million-token context

Model split: Pro vs Flash

Pricing signals aggressive positioning

Agentic capabilities move to the centre