DeepSeek just made its 75 percent price cut permanent. The model behind it scores alongside GPT-5.5 and Claude Opus 4.7 on agentic coding, ships its full weights under an open MIT license, and now costs a small fraction of what the Western frontier charges. The pressure this puts on every closed lab is hard to overstate.
What Actually Happened
DeepSeek confirmed on May 22, 2026 that the steep discount on DeepSeek V4 Pro, originally scheduled to expire May 31, is now the permanent list price. That sets V4 Pro at $0.435 per million input tokens and $0.87 per million output tokens, a roughly 75 percent reduction. The smaller V4 Flash lands at about $0.14 per million input tokens, competitive with the cheapest frontier-class models available anywhere. Both models shipped April 24 as a preview and are available through the DeepSeek API and as open weights under the MIT license, which permits commercial use and fine-tuning.
The architecture explains how the economics work. DeepSeek V4 Pro is a Mixture-of-Experts model with 1.6 trillion total parameters but only 49 billion activated per token, while V4 Flash carries 284 billion parameters with 13 billion activated. Both support a 1 million token context window. The headline technical trick is a hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention. At the 1M-token setting, DeepSeek says V4 Pro needs only 27 percent of the per-token inference compute and 10 percent of the KV cache memory compared with the prior V3.2 generation. Cheaper serving is not a marketing decision here, it is baked into the model design.
Why This Matters More Than People Think
For two years the industry assumed frontier capability and frontier pricing were joined at the hip. You paid premium rates because the best models were expensive to train and expensive to serve. DeepSeek V4 severs that link. A model that posts agentic coding scores in the same band as GPT-5.5 and Claude Opus 4.7, while charging under half a dollar per million input tokens, tells developers that the capability they have been renting at a markup is now available at something close to raw compute cost. When a top-tier model costs roughly one-tenth of its closest closed rival, the question stops being which model is best and becomes which model is good enough at this price.
The open weights make it structurally different from a cheap API. Because V4 ships under MIT, any company can download the model, run it on its own hardware, fine-tune it on proprietary data, and never send a token to an external server. For enterprises wrestling with data residency, latency, and vendor lock-in, that combination of frontier capability, open licensing, and near-zero marginal cost is the most disruptive thing to happen to the model market this year. The closed labs sell access. DeepSeek is giving away the asset and competing on everything else.
The Competitive Landscape
The direct comparison is brutal for the incumbents on price. GPT-5.5 lists at $5 per million input tokens and $30 per million output, while GPT-5.5 Pro runs $30 and $180. Claude Opus 4.7 sits in a similar premium tier. DeepSeek V4 Pro delivers comparable agentic coding performance at $0.435 and $0.87, which is more than a tenfold gap on input pricing. Gemini 3.1 Pro still leads on world knowledge and multimodal reasoning, and DeepSeek trails the very top on those axes, but for the specific job of writing and executing code in agent loops, the cost-performance line has moved decisively.
The more important contest is among the open-weights labs, and it is overwhelmingly Chinese. Inside a roughly two-week window, Z.ai's GLM-5.1, Moonshot's Kimi K2.6, MiniMax M2.7, and DeepSeek V4 all landed at a similar capability ceiling on agentic engineering, each at a fraction of Western inference cost. Alibaba's Qwen family rounds out the field. The pattern is unmistakable: the open-weights frontier is being set in China, and it is converging on strong coding ability at aggressive prices. Meta's Llama line, once the default open model in the West, now finds itself chasing rather than leading. For developers, the practical effect is a buffet of MIT and Apache licensed models that are nearly interchangeable on quality and competing almost entirely on cost and serving convenience.
Hidden Insight: The Margin in Inference Just Evaporated
The deeper story is what this does to the business model of selling intelligence by the token. Closed labs have funded their enormous training runs partly on healthy inference margins, the spread between what it costs to serve a query and what they charge. DeepSeek V4 attacks that spread from both ends. Its architecture cuts the cost to serve through sparse activation and compressed attention, and its open license plus aggressive pricing collapses what anyone can charge for an equivalent model. When a credible open alternative sits at one-tenth the price, every closed provider faces a choice between cutting prices toward DeepSeek and watching cost-sensitive volume migrate away.
The bear case, however, deserves equal weight. Critics argue that headline benchmark parity does not equal production parity, and that GPT-5.5 and Claude Opus 4.7 still win on the long-horizon reliability, tool-use robustness, and safety behavior that matter in real agent deployments. The risk for enterprises is operational: running a 1.6 trillion parameter MoE model in-house demands serious GPU capacity and inference expertise, so the sticker price hides a real total cost of ownership. Skeptics point out that V4 is still labeled a preview, that DeepSeek's permanent price cut may be a land-grab subsidized by motives other than profit, and that the model's Chinese origin raises data-governance and procurement concerns that will keep many Western enterprises on closed Western providers regardless of the math. A model you cannot deploy because your security team blocks it is not actually cheaper.
The structural signal points to commoditization arriving faster than the closed labs priced in. For the next 12 to 24 months, the most likely path is a bifurcated market: a thin premium tier where OpenAI, Anthropic, and Google charge for the last few points of reliability and the frontier-most capabilities, and a vast commodity tier where open-weights models from DeepSeek and its Chinese peers handle the bulk of coding, drafting, and agentic work at near-cost. The uncomfortable truth for anyone who built a business on selling generic model access is that the floor is now a free MIT-licensed model that is good enough for most tasks, and the only durable moats left are at the extremes of capability, in proprietary data, and in the application layer wrapped around the model.
There is a geopolitical layer underneath the pricing layer that is easy to miss. By open-sourcing a frontier-adjacent model under MIT and pricing the hosted version near cost, DeepSeek is not just competing commercially, it is seeding the global developer ecosystem with Chinese-built infrastructure. Every startup that fine-tunes V4, every tool that defaults to it, every benchmark that includes it deepens a dependency that has nothing to do with subscription revenue. The closed Western labs sell a product; DeepSeek is distributing a standard. That is a slower and more patient form of competition, and it is precisely the kind that incumbents focused on quarterly inference margins tend to underestimate until it is everywhere.
What to Watch Next
In the next 30 days, watch whether OpenAI, Anthropic, or Google respond with price cuts on their coding-oriented tiers. A visible reaction would confirm that DeepSeek's permanent discount is setting the market price, not just its own. Watch the coding tool ecosystem too, the Cursors, Windsurfs, and agent frameworks of the world, because the fastest signal of real adoption is which models they wire in as defaults for cost-sensitive users.
Over 90 to 180 days, track three things: whether DeepSeek lifts the preview label and ships a stable V4 with production guarantees, whether the Chinese open-weights cohort keeps converging on or surpasses Western coding benchmarks, and whether enterprise procurement policies soften or harden toward Chinese-origin open models. The decisive question is adoption depth. If V4 moves from benchmark charts into the actual stacks of Western developers at scale, the inference-margin business takes a permanent hit. If security and reliability concerns keep it confined to hobbyists and cost-desperate teams, the closed labs hold their premium for another cycle. The price has already been set. What remains uncertain is who is willing to pay it.
What This Means for Builders and Buyers
For the developer choosing a model this quarter, DeepSeek V4 reshapes the default calculus. The old reflex was to reach for a premium closed model and accept the bill as the cost of quality. The new reality is that for a large share of coding and agentic work, an open model at one-tenth the price clears the quality bar, and the savings compound brutally at scale. An agent that burns millions of tokens per task in long tool-use loops turns the per-token gap into a line item that finance notices, and that is exactly where cost-sensitive teams will migrate first.
The serving choice splits into two camps. Teams without GPU infrastructure will simply call the DeepSeek API and pocket the price difference, accepting the data-governance tradeoff of routing through a Chinese provider. Teams with strict data requirements or enough scale to justify it will download the MIT weights and self-host, converting a recurring API bill into a fixed infrastructure cost they control end to end. Both paths erode the closed labs, because either way the customer stops paying premium API rates for the bulk of their workload, reserving the expensive frontier models only for the hardest tasks where reliability genuinely matters.
The pricing also pressures the inference-provider ecosystem that has grown up around open weights. Companies like Together and Fireworks, along with the hyperscalers' own model-serving tiers, now have a frontier-grade open model they can host and undercut each other on, which compresses margins across the entire serving layer. That is good for buyers and punishing for anyone whose business assumed open models would stay a generation behind the closed frontier. The gap that justified premium pricing has narrowed to the point where it only holds at the very top of the capability curve.
For the closed labs, the strategic response is already visible in their roadmaps: move up-market into reliability, safety, and integrated agentic platforms where raw model access is only one ingredient. The defensible business is no longer the model weights, which DeepSeek just demonstrated can be given away, but the surrounding system of tools, evaluation, guarantees, and enterprise trust. The labs that internalize this fastest will reposition as platform and application companies. The ones that keep treating per-token access as the product will find DeepSeek and its Chinese peers setting their prices for them, quarter after quarter.
There is a talent and research dimension as well. By publishing both the weights and the architectural details behind Compressed Sparse Attention and the MoE design, DeepSeek hands the global research community a working blueprint for cheap long-context inference. Competitors will study it, replicate the techniques, and fold them into their own models, which accelerates the very commoditization that pressures DeepSeek itself. Open-sourcing a frontier-adjacent system is a double-edged strategy: it wins mindshare and ecosystem lock-in, but it also speeds the diffusion of the methods that made the model cheap in the first place.
The quiet winners in all of this are the application-layer companies that never wanted to be in the model business at all. A coding assistant, a customer-support agent, or a document-processing tool that treats the underlying model as a swappable commodity just watched its single largest variable cost fall by an order of magnitude. Those margins flow straight to whoever owns the customer relationship and the workflow, not to whoever trained the model. In that sense DeepSeek's price cut is less an attack on application builders than a windfall for them, and the real losers are the labs caught in the middle selling raw intelligence by the token.
When a frontier-grade coding model ships free under MIT at one-tenth the price, the question is no longer which model is best, but how anyone still charges a premium for good enough.
Key Takeaways
- DeepSeek made its 75 percent price cut permanent on May 22, setting V4 Pro at $0.435 input and $0.87 output per million tokens.
- V4 Pro is a 1.6 trillion parameter MoE with 49 billion activated; V4 Flash has 284 billion parameters and runs near $0.14 per million input tokens.
- Agentic coding scores sit alongside GPT-5.5 and Claude Opus 4.7, while GPT-5.5 lists at $5 input, more than a tenfold gap.
- Hybrid Compressed Sparse and Heavily Compressed Attention cut inference to 27 percent of compute and 10 percent of KV cache versus V3.2 at 1M context.
- Open MIT weights plus near-cost pricing pressure the inference margins that fund every closed lab.
Questions Worth Asking
- If an open MIT model matches the frontier on coding at one-tenth the price, what exactly are you still paying a premium provider for?
- Does running a 1.6 trillion parameter model in-house actually save money once GPU capacity and inference expertise are priced in?
- If the open-weights frontier is now set in China, what does that mean for which infrastructure the next generation of developers builds on by default?