Microsoft just fired the most consequential shot in its decade-long partnership with OpenAI, and it did so inside the product that made that partnership famous. At Build 2026 on June 2, the company unveiled Project Polaris, an in-house coding model that will replace GPT-4 Turbo as the default engine for every GitHub Copilot subscriber starting in August. The quiet detail buried under the keynote applause is the one that should make Sam Altman uneasy: Polaris runs entirely on Microsoft's own Maia silicon, which means the most widely deployed AI developer tool on Earth will soon owe OpenAI nothing for the tokens it serves. This is not a feature release. It is a supplier divorce conducted in public, dressed up as a performance upgrade.
What Actually Happened
Project Polaris is a mixture-of-experts model with sub-networks tuned for individual programming languages and frameworks, and Microsoft says it will become the default for all Copilot subscribers in August 2026. The migration is automatic, with an optional three-month fallback window for teams that want to keep GPT-4 Turbo while they evaluate the swap. On Microsoft's own numbers, Polaris beats GPT-4 Turbo on the HumanEval and MBPP code-generation benchmarks, with the largest gains in lower-resource languages like Rust and Haskell where training data is thin and specialized routing helps most. The headline is not the benchmark delta, though. It is where the model runs.
Polaris executes on Microsoft's Maia accelerators rather than Nvidia GPUs rented through Azure at OpenAI's direction, giving Microsoft end-to-end ownership of the inference stack for its flagship developer product. Copilot is not a side experiment. GitHub reports the assistant is used across more than 20 million developer seats, and it sits at the center of Microsoft's pitch that every engineer should ship faster with an agent at their elbow. Owning the model that powers those seats changes the unit economics of the entire business, because every completion that used to carry an OpenAI cost now carries only Microsoft's own compute.
The launch did not arrive alone. Polaris shipped alongside a multi-agent build of VS Code, and it slots into a broader family Microsoft has been assembling all spring, including MAI-Thinking-1, the company's first in-house reasoning model that it pointedly says was trained without OpenAI data, and MAI-Code-1-Flash, a low-cost code model aimed at the cheap end of the market. Read together, these are not scattered experiments. They are a deliberate ladder of models, from flash-tier to frontier reasoning, designed to let Microsoft serve every Copilot and Azure customer without paying a toll to the lab it helped create.
Why This Matters More Than People Think
The obvious story is cost. Microsoft has spent years paying OpenAI to power Copilot, and Copilot in turn has been criticized inside Redmond for thin or negative margins once inference and the revenue split are counted. Moving the default to a model Microsoft owns, running on chips Microsoft designed, attacks both sides of that equation at once. Every token Polaris generates instead of GPT-4 Turbo converts an external bill into an internal cost, and internal compute on Maia is exactly the kind of expense Microsoft can drive down over time through its own roadmap. For a product at Copilot's scale, even a few cents per thousand tokens compounds into hundreds of millions of dollars a year.
The less obvious story is leverage. For most of the GPT era, the uncomfortable truth about Microsoft was that its most important AI product depended on a partner it did not control, governed by a contract with an AGI escape clause and a counterparty raising capital at a trillion-dollar valuation. Polaris is Microsoft's answer to that dependency. By proving it can field a competitive coding model in-house, Microsoft resets the negotiating table for every future conversation about pricing, exclusivity, and access. The option to walk is worth more than walking, and Polaris is the credible threat that gives every other clause in the relationship a new price.
There is a strategic message aimed past OpenAI, too. Microsoft is telling enterprise buyers that Copilot is no longer hostage to any single lab, which matters enormously to risk-averse CIOs who have watched model providers change pricing, deprecate endpoints, and shift terms with little notice. A vertically integrated stack, model plus chips plus cloud plus IDE, is a procurement story as much as a technology one. It says the vendor controls its own destiny, and by extension protects the customer from the volatility that has defined the foundation-model market since ChatGPT launched. For a buyer signing a multi-year agreement, that stability can outweigh a few benchmark points.
The timing also reveals how the economics of coding assistants have shifted under everyone's feet. GitHub moved Copilot to usage-based billing on June 1, one day before the Polaris reveal, because agentic sessions burn far more tokens than the simple autocomplete completions Copilot shipped with in 2022. When a single agent task can chew through tens of thousands of tokens chasing a fix across a repository, the cost of the underlying model stops being a rounding error and becomes the whole business model. Owning Polaris lets Microsoft absorb that token explosion on its own infrastructure rather than passing OpenAI's metered price through to every autonomous session, which is precisely the workload growing fastest and the one that would have made a pure model-rental arrangement ruinous at scale.
The Competitive Landscape
The immediate loser is OpenAI, which sees its models pushed out of the default slot in the single largest distribution channel for AI-assisted coding. GPT-4 Turbo, and by extension the GPT-5.5 and Codex line that OpenAI now sells directly and through AWS Bedrock, loses the captive audience of tens of millions of Copilot users who never chose a model and simply accepted whatever Microsoft set. Anthropic is circling the same prize from the other direction, with Claude Code and Claude Opus 4.8 winning developer mindshare on raw quality, while Google pushes Gemini into Android Studio and its own tooling and xAI ships Grok-based coding agents. The coding-assistant market has become the most contested surface in all of AI.
The cleanest historical parallel is Apple's decade-long march away from its own suppliers. Apple dropped Google Maps for an in-house product in 2012, a launch so rough it triggered a public apology, then spent years quietly turning it into a strength. More tellingly, Apple replaced Intel with its own M-series silicon and discovered that owning the chip let it own the roadmap, the margins, and the differentiation. Microsoft is running the same playbook in software: tolerate a dependency until you can build the replacement, then move the default and let the ecosystem follow. The lesson from Apple is that the transition can be ugly at first and decisive in the end.
What makes this round different from past platform shifts is that the supplier is also a partner Microsoft funded, and the two firms still share infrastructure, customers, and a board-level relationship worth tens of billions of dollars. Microsoft is not severing the OpenAI tie so much as proving it can survive without it, a posture that lets it keep the upside of the relationship while removing the downside of the dependency. Competitors cannot easily copy this, because almost none of them own a frontier lab stake, a custom AI chip, a hyperscale cloud, and the dominant developer tool all at once. That stack is the moat.
Strip away the benchmark slides and the actual product Microsoft launched is optionality. The benchmark gains on HumanEval and MBPP are useful marketing, but those tests are saturated, and a few points of pass rate will not change how a senior engineer feels about their autocomplete. What changes the calculus for Microsoft is that it now has a model it can tune, price, and deploy on its own schedule, free of a partner's roadmap and a partner's economics. The headline number that matters is not a benchmark score. It is the percentage of Copilot inference Microsoft can now serve without writing a check to OpenAI, and that number is heading toward one hundred.
This reframes how to read every in-house model announcement from a hyperscaler. When Amazon builds Trainium, when Google builds TPUs and Gemini, when Microsoft builds Maia and Polaris, the press treats each as a bid to beat Nvidia or out-benchmark a rival lab. The deeper motive is almost always supply-chain control. These companies learned during the GPU shortage of 2023 and 2024 that depending on a single chip vendor or a single model provider is an existential risk when demand outstrips supply. Vertical integration is not about winning a benchmark. It is about never being told no by a supplier again.
The bear case, however, is straightforward and serious: a model you own is only valuable if it is good enough that users do not flee. Skeptics point out that GitHub already lets users pick Claude and Gemini inside Copilot, and the most demanding developers, the ones who set team standards, have largely chosen Anthropic's models for hard reasoning tasks. If Polaris is merely competitive rather than clearly better, Microsoft saves money on the silent majority who never switch models while quietly ceding the influential power users to rivals. The risk is a Copilot that is cheaper to run and less loved, which is exactly the trap Apple Maps fell into before it recovered.
There is a second, quieter risk that the divorce is incomplete by design. Microsoft still holds a large stake in OpenAI, still resells OpenAI models through Azure, and still benefits when OpenAI succeeds, which means Polaris is less a clean break than a hedge. That hedge is rational, but it also caps how aggressively Microsoft can market Polaris as superior, because trashing GPT would damage an asset Microsoft owns. The most likely outcome is not a dramatic split but a slow rebalancing, where Microsoft routes more volume to its own models each quarter while keeping OpenAI as a premium option, harvesting the margin without ever declaring the war it is clearly fighting.
The mixture-of-experts design hides a third insight most coverage will miss. By building specialized sub-networks per language and framework rather than one monolithic model, Microsoft is optimizing for exactly the workload it can see better than any rival: the actual code its tens of millions of Copilot users write every day. Microsoft owns the telemetry from the world's most-used code host and the world's most-used editor, a training-data advantage no pure lab can replicate. A generalist model from OpenAI or Anthropic must be good at everything, while Polaris only has to be excellent at the code GitHub users actually ship. That narrower target is how a smaller in-house model can credibly match a larger frontier one on the tasks that matter, and it is the structural reason this strategy can work even if Polaris never wins a single general benchmark.
What to Watch Next
In the next 30 days, watch for independent benchmarks. Microsoft's own HumanEval and MBPP numbers are a starting gun, not a verdict, and the real test will come when third parties run Polaris against Claude Opus 4.8, GPT-5.5, and Gemini on harder, less saturated suites like SWE-bench Verified and live repository tasks. Watch developer forums and the GitHub community discussions for the qualitative signal that benchmarks miss: do working engineers feel Polaris is as good, or do they immediately switch the default back to Claude or GPT the moment Microsoft flips the toggle in August?
Over the next 90 days, the number to track is migration friction. Microsoft promised automatic migration with a three-month GPT-4 fallback, so the window that opens in August and closes in November will reveal how many teams opt out and stay on OpenAI models. A low opt-out rate validates the strategy; a high one tells you Polaris is not ready and Microsoft moved too soon. Watch enterprise contracts too, because the largest Copilot customers negotiate model terms directly, and any public defection or endorsement from a marquee account will move the narrative faster than any benchmark.
On the 180-day horizon, the question is whether Polaris becomes the template for the rest of Microsoft's AI surface. If the company moves the default model in Word, Excel, and Teams Copilot to in-house MAI models the way it just did in GitHub Copilot, the OpenAI relationship shrinks from foundation to fallback across Microsoft's entire product line. Watch the MAI model family for a frontier-tier release that targets reasoning rather than code, because that is the announcement that would tell you Microsoft believes it can replace OpenAI not just at the cheap end, but at the top.
Microsoft did not build a better coding model. It built the ability to never need OpenAI's again.
Key Takeaways
- August 2026 default switch means every GitHub Copilot subscriber moves from GPT-4 Turbo to Microsoft's Polaris automatically, with a three-month fallback.
- Runs on Maia silicon, giving Microsoft end-to-end ownership of the inference stack and converting an OpenAI bill into internal compute cost.
- 20 million-plus Copilot seats make this the largest single redirection of AI coding volume away from OpenAI to date.
- Beats GPT-4 Turbo on HumanEval and MBPP per Microsoft, with the biggest gains in low-resource languages like Rust and Haskell.
- Part of the MAI family alongside MAI-Thinking-1 and MAI-Code-1-Flash, a full model ladder built to reduce OpenAI dependence across Microsoft.
Questions Worth Asking
- If Microsoft can replace OpenAI inside its most important developer product, what stops it from doing the same across Office, leaving OpenAI as a fallback rather than a foundation?
- When a model is cheaper to run but only equal in quality, does the silent majority of users who never switch defaults subsidize a slow erosion of the product's appeal to power users?
- If your business depends on a single AI provider today, what is your version of Polaris, and how long would it take you to build the credible option to walk away?