Back to Articles

Anthropic Cut Long-Context Pricing in Half. The Same Week Musk Admitted xAI 'Was Not Built Right.'

Skila AI Editorial Team

March 14, 2026

7 min read

Anthropic Cut Long-Context Pricing in Half. The Same Week Musk Admitted xAI 'Was Not Built Right.'

Claude Opus 4.6 and Sonnet 4.6 now support 1 million token context at flat pricing — eliminating the 100% surcharge that made long-context prohibitive. The same week, Musk publicly admitted xAI failed to compete with Claude Code and is rebuilding from scratch.

Two things happened in AI this week that, together, tell you everything about where the competitive landscape is heading.

On March 13, Anthropic quietly removed the most expensive pricing quirk in frontier AI: the long-context surcharge. Claude Opus 4.6 and Sonnet 4.6 now support a full 1 million token context window at standard pricing — no premium, no special headers, no waiting list. A 900K-token request now costs the same per token as a 9K one.

The same week, Elon Musk posted on X that xAI "was not built right the first time around, so is being rebuilt from the foundations up." His admission: xAI had failed to compete with Claude Code and OpenAI's Codex. His solution: poach two senior engineers from Cursor, the AI coding tool that just hit $2 billion in annualized revenue.

One company quietly made its best product dramatically more accessible. Another admitted it hadn't built the right thing at all. The contrast couldn't be sharper.

What Changed: The End of the Long-Context Tax

For the past year, building with long-context Claude came with an unspoken tax. Requests exceeding 200K tokens triggered up to a 100% surcharge, doubling or more the effective cost per token at long-context scales. The message was implicit but clear: long context is a premium capability, and you'll pay for it accordingly.

That surcharge is gone. As of March 13, 2026, there is no additional charge for long-context requests. The pricing table is blunt in what it implies:

Opus 4.6 input pricing dropped from $10 per million tokens to $5. Output dropped from $37.50 to $25. Sonnet 4.6 input went from $6 to $3, output from $22.50 to $15. For any developer who was regularly running requests above 200K tokens, this is a 50% or greater reduction in effective costs overnight.

The technical changes are minimal. The anthropic-beta: long-context-2025-01-01 header that developers previously had to include in every long-context request is no longer required — it still works, but you can drop it. Full rate limits now apply at every context length. There is no throttling for long-context requests, no access program, and no waiting list.

The availability is broad: Claude Platform natively, Microsoft Azure Foundry, Google Cloud Vertex AI, and Claude Code on Max, Team, and Enterprise tiers.

The Performance Question

A large context window is only valuable if the model can actually use it. This is where Anthropic's announcement spends the most time, and where the numbers get interesting.

Opus 4.6 scores 78.3% on MRCR v2 — the Multi-Round Coreference Resolution benchmark — at the full 1 million token scale. Anthropic claims this is the highest among frontier models. Sonnet 4.6 scores 68.4% on GraphWalks BFS at 1M tokens.

These are recall-focused benchmarks, and recall is the most important capability for long-context use cases in the real world: legal review, codebase analysis, contract processing, agent trace reconstruction. The question isn't whether the model can generate plausible text at 1M tokens — every frontier model can. The question is whether it can find the one clause buried on page 847 of a contract that changes everything. Anthropic is claiming Claude can.

The caveat Anthropic includes in the announcement is honest and worth quoting: "the broader problem of declining precision as context windows fill up is still far from solved." This is not a model that achieves perfect recall at all context lengths. But 78.3% at 1 million tokens, with flat pricing, is a materially different value proposition than what existed two weeks ago.

Six Times More Media. Zero Extra Cost.

Buried in the announcement: the media capacity per request increased from 100 images or PDF pages to 600. A sixfold increase.

For document-heavy workflows, this is the change that matters most. A legal due diligence package might span hundreds of PDFs. A codebase with extensive inline documentation might include dozens of generated diagrams. A financial model might reference hundreds of supporting exhibits. Previously, these workflows required chunking, summarization, and stitching — each step a potential source of information loss.

Now a legal associate can drop 600 pages of merger documents into a single Claude request and ask for the clauses that triggered Material Adverse Change provisions. A developer can include the full visual design spec alongside their codebase. An analyst can process an entire earnings season in a single context.

The media limit expansion is available on Claude Platform, Claude Code, Google Cloud Vertex AI, and Microsoft Foundry. Amazon Bedrock gets the context window increase but not the media expansion — a distinction worth noting for teams running infrastructure on AWS.

What This Unlocks for Developers

Anton Biryukov, a software engineer who was part of Anthropic's early access program, described the workflow shift: "With 1M context, I search, re-search, aggregate edge cases, and propose fixes — all in one window."

That's the fundamental change. The multi-step workflow that emerged as a workaround for context limitations — search, summarize, synthesize, repeat — becomes unnecessary for many use cases. The entire codebase fits. The entire document corpus fits. The entire agent trace fits.

For Claude Code users specifically, the practical implication is fewer "context compaction" events — the forced summarizations that occur when conversation history approaches the context limit. Jon Bell, a CPO reporting a 15% decrease in compaction events after the change, captured why this matters: compaction isn't just inconvenient, it's lossy. When an agent summarizes what happened earlier in a session, it inevitably drops information that might have been relevant later. Eliminating that step preserves fidelity.

The Cursor developer community reacted immediately. Multiple Cursor users posted on the forum within hours, noting that Cursor's previous 2x multiplier on tokens above 200K would also need to drop to reflect the new reality. Cursor confirmed the change. The full 1M window is now available through Max mode at standard pricing.

The Other Story: xAI Admits It Built the Wrong Thing

In any other week, the 1M context announcement would dominate the AI news cycle. But March 13 also produced one of the more unusual public admissions in recent tech history.

Elon Musk posted to X: "xAI was not built right the first time around, so is being rebuilt from the foundations up." He didn't elaborate on what "not right" meant, but the context around the statement filled in the gaps quickly.

Within hours of the post, TechCrunch and other outlets confirmed that xAI had hired Andrew Milich and Jason Ginsberg — both senior engineering leaders from Cursor — to rebuild xAI's AI coding tool from scratch. Musk explicitly cited Claude Code (Anthropic) and Codex (OpenAI) as the benchmarks xAI failed to meet. The new initiative has a name: Marcohard, a joint project with Tesla whose scope remains unclear but signals the restructuring extends beyond just the coding assistant.

The co-founder picture is more striking. Of the original 11 xAI co-founders, 9 have now departed. Only Manuel Kroiss and Ross Nordeen remain. The list of those who've left includes some of the most credentialed researchers in the field: Jimmy Ba, Greg Yang, Tony Wu, Zihang Dai. This is not normal attrition. This is an organization that has undergone fundamental changes in the people who built it.

Musk also made an unusual public appeal: "Many talented people over the past few years were declined an offer or even an interview @xAI. My apologies." He and engineering chief Baris Akis are reportedly reviewing the full backlog of rejected candidates to identify who should be contacted. A company asking its founders to manually review every rejected candidate is a company that has acknowledged it needs to rebuild its talent base from the ground up.

What Cursor Losing Two Senior Leaders Means

The Cursor angle of this story deserves more attention than it's received. Cursor just hit $2 billion in annualized recurring revenue — one of the fastest ARR ramps in enterprise software history. It is, by any measure, one of the hottest companies in the AI coding tools space.

And Musk was able to pull two senior engineering leaders from it.

This matters for reasons beyond xAI. It signals that Musk's ability to attract talent from successful AI companies remains intact despite the co-founder exodus, the public criticism, and the acknowledged product failures. Whether those engineers can rebuild something competitive with Claude Code and Codex in a compressed timeframe is a separate question. But the fact that xAI is a destination even for people leaving a $2B ARR success story says something about the continuing power of Musk's halo in certain parts of the talent market.

The Competitive Map: Who Has What Now

With these two announcements, the long-context AI landscape looks like this:

Anthropic has 1M context at flat pricing with the highest published recall benchmarks. Google's Gemini 2.5 Pro matches the 1M window but charges a premium above 200K tokens — a meaningful cost difference for heavy users. OpenAI's GPT-4.1 offers 1M at flat pricing but hasn't published equivalent recall benchmarks. GPT-5.4 caps at 256K tokens, a surprising limitation for a flagship model. xAI's Grok models sit at 128K tokens maximum, a gap that the Cursor hires are presumably being brought in to close.

The competitive question isn't just window size. It's cost-adjusted performance at realistic context lengths. A model that is technically capable of 1M tokens but degrades meaningfully above 300K is less useful than one that maintains recall across the full window, even if the nominal specifications look identical. Anthropic's MRCR benchmark at 1M tokens is specifically designed to test this: whether the model can retrieve and reason about information that appeared much earlier in a conversation.

The 78.3% score at 1M tokens is a claim that will be verified or contested by independent researchers in the coming weeks. But it gives developers a starting point that didn't exist before March 13.

The Practical Decision for Teams Right Now

For developers who have been building workarounds for context limitations — chunk-and-summarize pipelines, vector search retrieval, periodic context resets — the question is whether the flat pricing model changes the build-versus-workaround calculus.

The honest answer is: for some workflows, yes. Loading an entire codebase once, at $5 per million tokens for Opus 4.6 input, is now cost-comparable to many retrieval-augmented generation approaches that require embedding pipelines, vector databases, chunking logic, and the latency of multiple search-retrieve-generate cycles. For codebases under a few hundred thousand tokens, a single long-context call may now be simpler and cheaper than the retrieval alternative.

For document-intensive industries — legal review, financial analysis, pharmaceutical research, compliance — the 600 document-per-request limit combined with flat pricing is potentially transformative. These workflows weren't just expensive before; they required architectural complexity to work around context limits. That complexity is now avoidable for the subset of tasks where a single model call can process the full document set.

The larger structural shift is that context limitations stop being a primary architectural constraint for AI applications. Developers can now design around what they want to do rather than around what the context limit allows them to do. That change in the design space is probably worth more, in the long run, than any specific cost reduction.

Key Takeaways

✓1M token context is now generally available for Claude Opus 4.6 and Sonnet 4.6 at flat standard pricing — no more 100% long-context surcharge
✓Opus 4.6 pricing dropped from $10/$37.50 to $5/$25 per million input/output tokens for long-context requests
✓Media capacity increased 6x: AI models can now process 600 images or PDF pages per request (up from 100)
✓Opus 4.6 scores 78.3% on MRCR v2 retrieval benchmarks at full 1M context — claimed highest among frontier models
✓Elon Musk publicly admitted xAI 'was not built right the first time' and is rebuilding from scratch after failing to compete with Claude Code
✓xAI poached two senior Cursor engineering leaders as 9 of the original 11 co-founders have departed

S

Skila AI Editorial Team

The Skila AI editorial team researches and writes original content covering AI tools, model releases, open-source developments, and industry analysis. Our goal is to cut through the noise and give developers, product teams, and AI enthusiasts accurate, timely, and actionable information about the fast-moving AI ecosystem.

About Skila AI →

Anthropic

Claude

Context Window

Ai Pricing

Xai

Ai Coding Tools

Cursor

Related Resources

AI Tools Directory

Find and compare AI tools related to Anthropic

Open-Source Repositories

Explore related open-source projects, MCP servers, and agent skills

Weekly AI Digest

Get the top AI news, tool reviews, and developer insights delivered every week. No spam, unsubscribe anytime.

Join 1,000+ AI enthusiasts. Free forever.