DeepSeek V4 is out: open-source pricing, 1M context, and a bet on Chinese chips

DeepSeek has released a preview of V4, which MIT Technology Review describes as its most significant release since the R1 reasoning model launched in January 2025. According to MIT Technology Review’s analysis, V4 is open source, available in two variants (V4-Pro and V4-Flash), and comes with three distinct claims to attention: pricing that undercuts closed-source alternatives, a 1-million-token context window built on a new memory-efficiency architecture, and a deliberate pivot toward Huawei’s Ascend chips.

The review is direct about expectations: “Will V4 shake the AI field the way R1 did? Almost certainly not.” R1 stunned the industry partly because of surprise — a little-known research team producing a strong, efficient reasoning model on constrained compute. V4 arrives with more anticipation and more scrutiny, after months of personnel departures, delayed launches, and pressure from both US and Chinese governments.

Pricing and performance

V4-Pro is aimed at coding and complex agent tasks; V4-Flash is optimized for speed and low cost. Both include reasoning modes that show intermediate steps. The post reports V4-Pro at $1.74 per million input tokens and $3.48 per million output tokens. V4-Flash is substantially cheaper: approximately $0.14 per million input tokens and $0.28 per million output tokens.

DeepSeek also shared results from an internal survey of 85 experienced developers: more than 90% included V4-Pro among their top model choices for coding tasks. The company says it has specifically optimized V4 for agent frameworks including Claude Code, OpenClaw, and CodeBuddy. These figures come from DeepSeek’s own benchmarking and internal survey — independent verification is not cited in the piece.

A new approach to long context

Both V4-Pro and V4-Flash support a 1-million-token context window — large enough, the article notes, to fit all three volumes of The Lord of the Rings and The Hobbit combined. The article focuses on how the window was achieved, not only its size.

The key change is in the attention mechanism. Rather than treating all prior text as equally important, V4 compresses older information and focuses on the most contextually relevant parts while keeping nearby text in full. According to DeepSeek’s numbers, V4-Pro uses only 27% of the computing power required by its predecessor V3.2 in a 1-million-token context, and cuts memory use to 10%. V4-Flash goes further: 10% of the computing power and 7% of the memory. The article positions this as the product of a sustained research effort — DeepSeek has published a series of papers on AI memory techniques over the past year and a half.

The article notes the practical consequence is cheaper deployment for long-context applications — an AI coding assistant that can read an entire codebase, or a research agent that can analyze a long archive of documents.

Chinese chips as the third story

V4 is, according to the post, DeepSeek’s first model optimized for domestic Chinese chips — specifically Huawei’s Ascend series. Huawei confirmed that its Ascend supernode products, based on the Ascend 950 series, will support DeepSeek V4.

The article frames this as deliberate. The Information reported earlier this month that DeepSeek did not give Nvidia or AMD early prerelease access to V4 — atypical practice — and instead gave early access only to Chinese chipmakers. Reuters had previously reported that Chinese government officials recommended DeepSeek integrate Huawei chips in its training. The context: US export controls since 2022 have progressively cut Chinese firms off from Nvidia’s most capable hardware, and Beijing has pushed data centers and public computing projects toward domestic alternatives.

The MIT Technology Review piece notes the efficiency numbers come from DeepSeek’s own benchmarking; independent verification is not cited.