The Protocol OpenAI Built With Its Fiercest Rivals Is Quietly Rewriting the Rules of AI Infrastructure
Big Tech

The Protocol OpenAI Built With Its Fiercest Rivals Is Quietly Rewriting the Rules of AI Infrastructure

OpenAI's MRC protocol lets AI supercomputers scale to 131,000 GPUs on just two network tiers — and every major chip company signed on, including Nvidia.

TFF Editorial
2026년 5월 11일
11분 읽기
공유:XLinkedIn

핵심 요점

  • 131,000 GPUs on 2 Ethernet switch tiers — MRC cuts switching infrastructure for frontier AI training clusters by 33–50%, saving hundreds of millions per facility
  • AMD, Broadcom, Intel, Microsoft, and Nvidia all co-developed MRC — unprecedented alignment on a single open networking standard released through the Open Compute Project
  • Already running GPT-5.5 training workloads — MRC is deployed in production at Oracle Cloud's Abilene TX facility and Microsoft's Fairwater supercomputers
  • InfiniBand generates ~$4–5B/year for Nvidia — MRC's Ethernet-native architecture opens that revenue stream to Cisco, Arista, Broadcom, and Marvell for the first time
  • Arista stock jumped 7% on MRC announcement — market immediately priced in competitive disruption to Nvidia's networking moat from the shift to Ethernet-based AI infrastructure

Nvidia built a multibillion-dollar networking monopoly inside AI infrastructure largely because InfiniBand was the only technology that could reliably connect 50,000 or more GPUs at scale. Then OpenAI quietly built something with AMD, Broadcom, Intel, Microsoft, and Nvidia itself , and released it for free. The networking monopoly may have just developed a crack.

What Actually Happened

OpenAI has published MRC , Multipath Reliable Connection , a new open networking protocol for AI supercomputers, released through the Open Compute Project (OCP). The protocol was co-developed with AMD, Broadcom, Intel, Microsoft, and Nvidia, and is already deployed across OpenAI's largest production GPU clusters, including its Oracle Cloud Infrastructure facility in Abilene, Texas and Microsoft's Fairwater supercomputing site. These are the same clusters used to train GPT-5.5, OpenAI's current default model serving 300 million users.

The technical achievement is significant. Using MRC, a fully interconnected network supporting roughly 131,000 GPUs can be built using only two Ethernet switch tiers. Traditional 800 Gb/s InfiniBand-based architectures typically require three or four tiers to reach comparable scale. The protocol achieves this through two innovations working in tandem: adaptive packet spraying, which load-balances traffic across all available network paths with essentially no congestion in the network core during synchronous training, and SRv6 source routing, which bypasses network failures instantly without relying on dynamic routing protocols. MRC was released through OCP, meaning the entire industry now has access to the specification.

Why This Matters More Than People Think

InfiniBand has been Nvidia's largely uncontested moat inside the AI infrastructure stack. While Nvidia's GPU business dominates headlines, the networking division , NVIDIA Quantum InfiniBand , has generated an estimated $4 5 billion annually in high-margin revenue from hyperscalers and frontier labs that had no practical alternative for interconnecting large training clusters. InfiniBand's lock-in was not just technical. It was ecosystemic: proprietary cables, proprietary switches, proprietary drivers, and two decades of operational institutional knowledge inside the teams maintaining the world's largest training clusters.

Stay Ahead

Get daily AI signals before the market moves.

Join 1,000+ founders and investors reading TechFastForward.

MRC runs on standard Ethernet. Every major cloud provider already runs Ethernet at scale. Every enterprise IT team knows how to operate it. The switching silicon comes from Broadcom, Marvell, and Cisco , competitive suppliers who price aggressively against each other. If MRC achieves the performance parity with InfiniBand that OpenAI's production deployment data suggests, the total addressable cost of interconnecting a 131,000-GPU training cluster could fall by 30 to 50 percent, representing hundreds of millions of dollars in savings per facility at scale. That money currently flows to Nvidia's networking division. The question is where it flows once it doesn't have to.

The Competitive Landscape

The MRC launch represents something strategically unusual: Nvidia is one of MRC's co-developers and public supporters. This seems paradoxical , why would Nvidia help develop a protocol that threatens its InfiniBand revenue? The answer is that Nvidia read the writing on the wall earlier than most observers. The hyperscalers , Google with its Jupiter networking, Meta with its RoCE-based clusters, and Microsoft with Fairwater , have all been building Ethernet-based alternatives to InfiniBand for several years. Nvidia's Spectrum-X product, announced in 2024, was already Nvidia's hedge: an Ethernet-native AI networking product positioned for exactly the world MRC is now accelerating into existence.

By co-developing MRC, Nvidia keeps its networking division relevant in the Ethernet era rather than being displaced entirely. But the strategic win is asymmetric. Cisco, Arista, and Broadcom , companies largely locked out of the AI networking market by InfiniBand's dominance , now have a credible, production-validated protocol they can build against. Arista Networks stock jumped 7 percent in the days following the MRC announcement as analysts recalculated the size of the Ethernet AI networking opportunity. The OCP release means MRC belongs to the industry, not OpenAI , accelerating the timeline for competitive Ethernet solutions that Nvidia cannot control through proprietary licensing.

Hidden Insight: Why OpenAI Gave It Away

The decision to release MRC through the Open Compute Project rather than keep it proprietary deserves scrutiny, because it is not obviously in OpenAI's short-term competitive interest. Open-sourcing MRC means every competitor , Google, Anthropic, xAI, Meta , gets access to the same networking efficiency gains OpenAI's own clusters are using. So why give it away?

OpenAI's competitive advantage is not its networking architecture. It is model quality, safety processes, developer ecosystem, and the scale of capital it can deploy. What OpenAI cannot afford is a future where its infrastructure costs are structurally higher than competitors' because it is building on proprietary InfiniBand while others find cheaper alternatives. By leading the industry toward an open Ethernet standard , one OpenAI itself designed and validated in production , OpenAI ensures that compute cost parity becomes the industry baseline, rather than a competitive liability for whoever moves away from InfiniBand first. OpenAI benefits from a world where infrastructure is cheap and commoditized, because its differentiation lives above the infrastructure layer.

There is a geopolitical dimension that has received almost no coverage. InfiniBand is a technology where U.S. export controls have been relatively effective, because the specialized silicon and switches are manufactured in a small number of jurisdictions under U.S. influence. Ethernet infrastructure is far more globally distributed. An industry shift from InfiniBand to Ethernet-based AI networking makes it harder for export controls to constrain the ability of countries outside U.S. alignment to build large training clusters. Whether OpenAI considered this second-order consequence before releasing MRC through OCP is an open question. The effect exists regardless of the intent , and policymakers in Washington who are simultaneously engaged in AI talks with Beijing will eventually have to reckon with it.

What to Watch Next

The clearest short-term indicator will be hyperscaler networking procurement announcements over the next 60 to 90 days. If Amazon Web Services, Google Cloud, or Azure announce Ethernet-based AI networking expansions that specifically cite MRC compatibility, the InfiniBand disruption thesis is confirmed at commercial scale. Watch AMD's data center networking revenue specifically , AMD co-developed MRC and would be a primary beneficiary if Ethernet-based AI clusters displace InfiniBand in the buildout cycle for the next generation of training facilities. AMD's data center segment has been growing at over 50 percent year-over-year, and a networking tailwind from MRC adoption could accelerate that further.

Watch also for Nvidia's response. If Nvidia accelerates Spectrum-X pricing cuts or announces deeper software integration with MRC-compatible switches, it signals the company views the Ethernet transition as inevitable and is fighting to retain margin within the new paradigm rather than defending InfiniBand at all costs. The 180-day window after this announcement is the critical observation period: by then, the major hyperscalers will have made their next-generation cluster architecture decisions, and those choices will define the networking stack for the $660 billion AI infrastructure build-out cycle currently underway. Infrastructure bets are long , the decisions being made right now will compound for five to seven years.

OpenAI just gave the whole industry a cheaper way to build supercomputers , and the company that profits most from the old way co-signed the paper.


Key Takeaways

  • 131,000 GPUs on 2 Ethernet switch tiers , MRC cuts switching infrastructure required for frontier AI training clusters by 33 50%, potentially saving hundreds of millions of dollars per facility
  • AMD, Broadcom, Intel, Microsoft, and Nvidia all co-developed MRC , Unprecedented alignment across competing hardware companies on a single open networking standard, released through OCP
  • Already running GPT-5.5 training workloads , MRC is deployed in production at Oracle Cloud's Abilene, TX facility and Microsoft's Fairwater supercomputers
  • InfiniBand generates ~$4 5B/year for Nvidia , MRC's Ethernet-native architecture opens that revenue stream to Cisco, Arista, Broadcom, and Marvell for the first time at frontier AI scale
  • Arista stock jumped 7% on MRC announcement , Market immediately priced in the competitive disruption to Nvidia's networking moat from the coming shift to Ethernet-based AI infrastructure

Questions Worth Asking

  1. If Ethernet-based MRC replaces InfiniBand for AI clusters, does that make it easier for nations under U.S. export controls to build frontier training infrastructure , and did OpenAI weigh that before releasing through OCP?
  2. OpenAI designed the networking architecture that trains its frontier models and then gave it away for free , what does that decision reveal about where the company believes its actual competitive moat lives?
  3. If the infrastructure cost of training a frontier model drops 40% because of open protocols like MRC, does that intensify or defuse the AI arms race , and which outcome do you want to be true?
공유:XLinkedIn