

Anthropic Fable 5 Beats All AI Rivals on Code 2026
Anthropic's Fable 5 tops SWE-Bench Pro at 80.3%, beating GPT-5.5 by 22 points, opening the Mythos-class tier at $10 per million tokens.
New foundation models, benchmarks, and capability breakthroughs.


Anthropic's Fable 5 tops SWE-Bench Pro at 80.3%, beating GPT-5.5 by 22 points, opening the Mythos-class tier at $10 per million tokens.


Apple unveiled 5 AFM foundation models at WWDC 2026, distilling Google Gemini and running the cloud tier on Nvidia GPUs in Google Cloud infrastructure.


OpenAI's GPT-5.6 appears in Codex routing logs under codename iris-alpha, with 1.5M tokens of context and 80-89% Polymarket odds of a June launch.


OpenAI will retire GPT-4.5 from ChatGPT on June 27 and o3 on August 26, completing the first full clearout of the pre-GPT-5 model generation.


Alibaba's Qwen 3.7 Plus adds vision and video to its Qwen backbone at $0.40 per million tokens, 6x cheaper than Qwen 3.7 Max, shipping June 1.
Microsoft's MAI-Thinking-1, its first in-house reasoning model trained without OpenAI data, was preferred over Claude Sonnet 4.6 in blind human tests.
Anthropic's leaked 512,000-line Claude Code source map exposes Sonnet 4.8 and a skipped 4.7, plus hidden API tiers and routing thresholds rivals can study.
Gemini 3.5 Pro ships a 2 million token context window, double Flash, as Google launches its cheaper model first to pressure rivals on price.
Microsoft MAI-Code-1-Flash, a 5B in-house coding model, beats Claude Haiku 4.5 on SWE-Bench Pro and uses up to 60% fewer tokens in GitHub Copilot.
Google Gemma 4 12B is an encoder-free multimodal model that runs text, image, audio, and video on a 16GB laptop under Apache 2.0 with a 256K context.
Microsoft launched 7 MAI models spanning code, reasoning, image, voice, and transcription at Build 2026, a full-stack bid to cut its OpenAI reliance.
Claude Opus 4.8 scores 61.4 on the Artificial Analysis Intelligence Index and 69.2% on SWE-Bench Pro, beating GPT-5.5 just 41 days after Opus 4.7.
OpenAI's GPT-Realtime-2 cuts voice agent latency below one second with GPT-5-class reasoning and a 128K context, resetting voice AI economics.
DeepSeek V4-Pro made its 75% price cut permanent at $0.87 per million tokens, a frontier-grade coding model that undercuts US labs with open weights.
MiniMax M3, an open-weight model from Shanghai, beats GPT-5.5 on SWE-Bench Pro at 1/12 the cost, putting the AI coding frontier up for free download.
Ideogram 4 is a 9.3B open-weight text-to-image model with native 2K resolution that tops every open rival on text rendering and design quality.
xAI Composer 2.5 launched June 1, 2026, scoring 69.3 on agentic coding versus Grok 4.20 at 47.1, built on Moonshot Kimi K2.5 open weights.
Qwen3.7 Max scored 56.6 on the Artificial Analysis Index on May 20, 2026, the top Chinese model, beating DeepSeek V4 Pro on agentic coding.
OpenAI GPT-Rosalind beats GPT-5.5 across drug discovery and genomics while using 31% fewer tokens, and opens its science model preview worldwide.
Nvidia Cosmos 3 launches as the first fully open physical AI omnimodel, generating video and robot actions to cut robotics training from months to days.
Google Gemma 4 ships under Apache 2.0 and its 31B model beats 400B open rivals on agentic and coding tests while running on a single GPU.
Nvidia Nemotron 3 Ultra is a 550B open model with 55B active params running agents 5x faster at 30% lower cost, the fastest US open model in 2026.


Microsoft Polaris replaces GPT-4 Turbo as the default GitHub Copilot model in August 2026, running on in-house Maia chips and cutting OpenAI reliance.


Google's Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1, the first Flash-tier model to beat its own Pro tier on coding and agentic tasks.


Meta has repeatedly delayed its Muse Spark API, the developer gateway to its next AI model, as rivals OpenAI and Anthropic ship live and pull ahead.


Nvidia GR00T N2 tops MolmoSpaces and RoboArena with 2x the rival success rate, a world action model for humanoid robots shipping in late 2026.


OpenAI retires GPT-4.5 from ChatGPT on June 27, routing users to GPT-5.5 and ending the era of user-selectable model choice in its consumer app.


Microsoft MAI-Thinking-1 is its first reasoning model trained with zero OpenAI data, scoring 97% on AIME 2025 and matching Claude Opus 4.6 on SWE-Bench.


Microsoft MAI-Code-1-Flash beats Claude Haiku 4.5 and cuts token use 60 percent as GitHub Copilot moves to metered credit billing this month.


xAI's Grok 4.3 scores ELO 1500 on GDPval-AA and ranks first on CaseLaw v2, while cutting benchmark costs 20 percent below Grok 4.20.


Microsoft's MAI-Thinking-1 matches Claude Sonnet 4.6 in blind tests at 10x lower cost, signaling a new phase of AI model competition.


Claude Opus 4.8 scores 69.2% on SWE-bench Pro, beating GPT-5.5 at 58.6%, while its 3x fast mode makes it the cheapest frontier coding model available.


OpenAI replaces ChatGPT default with GPT-5.5 Instant, improving accuracy, STEM, and image understanding for all users. GPT-5.6 arrives June 2026.


Google Gemini Omni Flash generates video from any input in a unified reasoning pass, the first native any-to-any multimodal AI at I/O 2026.


OpenAI GPT-5.4 scores 75.0% on OSWorld-Verified, beating the 72.4% human baseline, with a 1M token context window for long autonomous agent work.


Nvidia's Alpamayo 2 Super is a 32B open reasoning model for level 4 robotaxis, free on Hugging Face this summer, turning autonomy into a commodity.


MiniMax M3 scores 59% on SWE-Bench Pro, beating GPT-5.5 and Gemini 3.1 Pro at roughly 10% of the cost, with open weights due within ten days.


Cohere Command A+ ships 218B parameters under Apache 2.0 and runs on two H100 GPUs, the first 200B-plus open model with native citation grounding built in.


Qwen 3.7 Max scores 92.4 on GPQA Diamond with a 1M-token window and $2.50 input pricing, undercutting GPT-5.5 on cost while topping Claude on reasoning.


Nvidia's Cosmos 3 is the first open Physical AI omnimodel, ranking number one across seven robotics benchmarks in 8B Nano and 32B Super sizes.


Nvidia's Nemotron 3 Ultra is a 550B open model that tops US open-weights rankings at 300 tokens per second, with weights and training recipes released.


DeepSeek V4 makes its 75 percent price cut permanent at 0.435 per million input tokens, matching GPT-5.5 on agentic coding under an open MIT license.


Google Gemma 4 31B scores 89.2% on AIME 2026 and beats Llama 4 across math, code, and agentic tasks, all under a free Apache 2.0 open license.


Claude Opus 4.8 hits 88.6% on SWE-bench Verified and adds Dynamic Workflows to run hundreds of subagents at once, all at the prior Opus price.


Gemini 3.5 Flash is now generally available at $1.50 per million input tokens, beating Gemini 3.1 Pro on coding while running 4x faster.


Google cut its AI Ultra plan 60% from $250 to $100 a month and launched Gemini 3.5 Flash, undercutting OpenAI and Anthropic on frontier AI pricing.


Meta's first model under Alexandr Wang hits the top 5 on benchmarks but fails its safety review, marking the end of Meta's open-source AI strategy.


NVIDIA's Nemotron 3 Super scores 85.6% on PinchBench with 5x throughput over its predecessor, the strongest open agentic model for enterprise deployments.


Subquadratic's SubQ uses sparse attention to process 12 million tokens at 1/1,000th the compute of frontier models, raising $29M at a $500M valuation.


Thinking Machines Lab's TML-Interaction-Small scores 77.8 on FD-bench versus 54.3 for Gemini, built by Mira Murati to replace turn-based voice AI.