10,000,000. This is the number of context-window tokens Meta built into Llama 4 Scout. That is 78 times GPT-4o's default context of 128,000 tokens. Meta released this model as free open weights on April 5, 2026. And the AI industry is slowly realizing this number is not a simple spec race.
What Happened: Llama 4's Two Protagonists
Meta released two models immediately in the Llama 4 series. Llama 4 Scout set an industry record for the longest context at 10 million tokens, with 109 billion total parameters (16 experts, 17 billion active). Llama 4 Maverick, with 400 billion total parameters (128 experts, 17 billion active) and a 1-million-token context, scored MMLU 91.8%, HumanEval 91.5%, and SWE-bench 74.2%, exceeding both GPT-4o and Gemini 2.0 Flash. API pricing on a blended basis is $0.19 to $0.49 per million tokens. Behemoth (288 billion active), which has no open weights yet, surpasses GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks, Meta says.
Why This Matters More Than People Think
That Maverick surpassed GPT-4o matters, but it is not the core. The real issue is that Scout's 10-million-token context is shaking the foundation of enterprise AI architecture. One of the most celebrated technologies in the AI startup ecosystem over the past two years was RAG (Retrieval-Augmented Generation), a method of chopping complex corporate documents, databases, and codebases into chunks and retrieving them so AI can process them. But if the context window is 10 million tokens, you can just put in a codebase of millions of lines or years of customer conversation data, all of it. Because it is open weights, enterprises in regions with strong data-sovereignty rules like the EU and India can run it on their own infrastructure.
Hidden Insight: The Categories 10 Million Tokens Kills
Historically, context-window expansion was gradual. 4K to 8K to 32K to 128K. But 10M is not a simple expansion, it is a paradigm shift. Right now countless startups are building "enterprise document search AI," "codebase understanding AI," and "meeting-notes analysis AI" on a RAG foundation. Scout's 10 million tokens potentially commoditizes this entire category. Meanwhile, because Scout is open weights, an enterprise that used Google's and OpenAI's multi-billion-dollar models can deploy Scout on its own infrastructure and dramatically cut cloud API costs. The way Meta uses open source as a strategic weapon resembles the moment in 2016 when Facebook open-sourced React. React became the standard of the frontend ecosystem. If Llama becomes the React of AI infrastructure, whoever controls that ecosystem takes the next round. The bear case, however, is real: critics argue that a 10-million-token context is expensive and slow to fill in practice, that retrieval still beats brute-force context on cost and latency for most workloads, and the risk is that Meta's headline benchmark numbers, some drawn from an experimental chat variant, do not survive independent enterprise testing.
