Claude Opus 4.7: 41.6% SWE-Bench, Same Price as 4.6

Anthropic released Claude Opus 4.7 on April 16 2026. The pitch is three words long: hand it off.

Hand off the refactor you've been dodging. Hand off the migration everyone punted on. Hand off the bug that took two senior engineers a full day last quarter. That is the framing Anthropic is using, and the benchmark numbers suggest it is not marketing fluff.

SWE-Bench Verified: 41.6%. CursorBench: 70%, up from 58% on Opus 4.6. Rakuten's internal SWE-Bench variant says Opus 4.7 resolves three times more production tasks than its predecessor. Box deployed it internally and measured a 56% drop in model calls and a 24% response speedup.

Same price as 4.6. $5 per million input tokens. $25 per million output tokens. No premium for the new capabilities.

What actually changed between 4.6 and 4.7

Five months is a short gap for a flagship model. Anthropic shipped Opus 4.6 in November 2025, and 4.7 lands on April 16 2026. The release cadence suggests Anthropic is treating Opus as a living coding product, not a yearly event.

Three capability shifts stand out:

1. Coding benchmarks jumped across the board. SWE-Bench Verified is the industry's closest proxy for real software engineering work — it runs against actual GitHub issues from 12 popular Python repos. Opus 4.7 hits 41.6%. That is a meaningful step up from 4.6 and puts it ahead of GPT-5.4 and Gemini 3.1 Pro on the same benchmark.

2. Vision got a 3x resolution upgrade. The model now accepts images up to 2,576 pixels on the long edge. Roughly 3.75 megapixels. You can feed it a full-resolution Figma export, a 4K screenshot of a dashboard, or a dense architecture diagram without downsampling. Prior Claude models capped out around 1.25 megapixels.

3. File-system memory for long sessions. Opus 4.7 has improved multi-session memory tied to files. For devs running agent loops that span hours or days, the model now holds context better across sessions without ballooning the prompt.

The benchmark numbers in context

Numbers without context are just trivia. Here is how the Opus 4.7 benchmarks compare to what teams were running last year.

SWE-Bench Verified: 41.6% — up from roughly 31% on Opus 4.6, and ahead of GPT-5.4 and Gemini 3.1 Pro on the same test
CursorBench: 70% — Cursor's internal benchmark measures autonomous agent completion on Cursor-like tasks. 4.6 scored 58%
Rakuten-SWE-Bench: 3x more resolutions than 4.6 on Rakuten's in-house production task suite

Box ran its own evaluation after integrating Opus 4.7 into internal agent workflows. The numbers Box published:

56% reduction in total model calls per task
50% fewer tool calls per task
24% faster end-to-end response time
30% fewer AI Units consumed per completed task

Read that again. Fewer calls. Fewer tools invoked. Faster. Cheaper per finished task. Those numbers matter more than any synthetic benchmark because they track how the model behaves when a real company tries to replace real engineering hours.

The catch: tokenizer changes

Opus 4.7 ships with an updated tokenizer. Anthropic says input token counts run 1.0 to 1.35 times higher than 4.6 for the same prompt. At higher effort levels, output token counts also climb.

What does that mean in practice? If you were spending $800 a month on Opus 4.6, your worst case on 4.7 is roughly $1,080 for identical workload — before accounting for the 30% fewer AI Units that Box measured on finished tasks. Net-net, teams running agent loops should see a cost drop. Teams running single-shot chat queries might see a small bump.

Anthropic is being unusually transparent about the trade-off. That is a good sign.

The Anthropic Labs angle

Opus 4.7 is not shipping alone. Anthropic simultaneously unveiled Claude Design through a new Anthropic Labs umbrella — a prompt-to-prototype product positioned against Figma and Canva. Claude Design runs on Opus 4.7 under the hood.

That matters because it signals Anthropic's strategy: the same model that handles your refactor is the one generating your marketing deck. One reasoning engine, multiple surfaces. It is the opposite of OpenAI's approach of fragmenting products across model families.

Where Opus 4.7 is available

Day-one availability on four platforms:

claude.ai and Claude Code — default model for Pro, Max, Team, and Enterprise
Anthropic API — model ID claude-opus-4-7
AWS Bedrock — available in us-east-1 and us-west-2 regions at launch
Microsoft Foundry — confirmed by Anthropic's Foundry partnership, global availability
Google Vertex AI — publisher model, available on launch day

The multi-cloud launch is notable because Anthropic has historically staggered platform rollouts. For Opus 4.7, the company closed that gap. Your enterprise sourcing team can pick their preferred cloud without waiting.

What to stop doing

If you were still running Opus 4.6 as your default in production agents, migrate. The cost-per-finished-task is lower on 4.7 when you count the drop in tool calls. The only reason to stay on 4.6 is a hard dependency on the old tokenizer's output format, which almost nobody has.

If you were using Claude Sonnet for coding to save money, run the math again. The gap between Sonnet and Opus 4.7 on SWE-Bench is wide enough that the finished-work cost may now favor Opus. Box's numbers are the clearest public signal of this.

If you were waiting for "one more model generation" to build an agent loop around Anthropic, the wait is over. This is the generation.

What Opus 4.7 does not solve

Credibility requires honesty. Opus 4.7 is not a universal upgrade.

The model still struggles with tasks that require persistent state across organizations — multi-tenant agent workflows where data must be isolated but knowledge must transfer. It is better than 4.6, but not solved.

It also trails Claude Mythos Preview on pure reasoning benchmarks. Mythos is Anthropic's internal research model — not generally available. If you need frontier reasoning above Opus-tier, your options are limited.

And it still does not match specialized models for narrow domains. A finance-tuned model will beat Opus 4.7 at finance-specific tasks. A radiology model will beat it at radiology. Opus 4.7 is a generalist with software engineering specialization, not a domain expert.

The bigger picture

Anthropic just made Opus 4.7 the default model inside Claude Code. That is a quiet but important move. Claude Code is the surface where Anthropic's enterprise customers spend the most tokens, and making Opus 4.7 the default means the company is betting its flagship coding product on this model's reliability.

Combine that with the Claude Design launch, the 4x cloud availability, and the unchanged pricing, and you get a clear message: Anthropic is not chasing GPT-5.4 on benchmarks. It is building a coding engine that companies can actually deploy at scale without the finance team revolting.

Check out the related block/goose open-source AI agent for a glimpse of how the broader ecosystem is using Claude Opus models in production. For teams exploring Anthropic's tooling, the jamesrochabrun/skills Claude Code Toolkit is worth a look — 24 battle-tested skills built on the Claude Code plugin system.

The real cost math for a 20-engineer team

Benchmark percentages are abstract. Dollar savings are not. Run the numbers for a team that adopts Opus 4.7 across a typical sprint cycle.

Assume 20 senior engineers at a fully-loaded cost of $75 per hour. If Opus 4.7 saves each engineer 20 hours per week on the coding work they used to babysit — which is what Box's deployment data suggests once you back out the fewer model calls and faster responses — that is $1,500 per engineer per week. Across 20 engineers that is $30,000 per week, or $1.56 million per year, in reclaimed engineering time.

Against that, the API bill. A 20-person team running Opus 4.7 in agent mode for 40 hours a week burns through roughly 400 million tokens per month at heavy usage. At $5 per million input and $25 per million output — call it a 50/50 split for agent workflows — monthly cost lands near $6,000. Even doubling that for tokenizer inflation puts you at $144,000 per year.

Net: $1.56M saved minus $144K spent equals $1.42M per year per 20-engineer team. Those are rough numbers, but the ratio — roughly 10-to-1 savings — matches what Anthropic customers reported in the Opus 4.7 press materials.

How Opus 4.7 changes the agent architecture decision

Before Opus 4.7, most teams built agent loops with a cheaper reasoning model plus a more expensive model for hard steps. The router pattern. That architecture exists because running a top-tier model on every turn was too expensive to sustain.

Opus 4.7 breaks that calculus. With Box's reported 56% drop in total model calls, the finished cost of running Opus 4.7 on every turn is often lower than the router setup because you stop paying for the reasoning-model calls that never produced useful output. The fewer-calls, higher-quality path wins.

That is bad news for the router orchestration layer companies. Tools like LangGraph, CrewAI, and AutoGen were partly solving for the cheap-model-plus-expensive-model routing problem. If that problem starts to disappear, so does a chunk of their reason for being.

That is not to say orchestration frameworks are dead. They still handle retries, state persistence, tool registration, and human-in-the-loop handoffs. But the routing logic that justified a lot of their complexity is now a less compelling default.

Frequently Asked Questions

What is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic's flagship large language model, released on April 16 2026. It targets software engineering work, scoring 41.6% on SWE-Bench Verified and 70% on CursorBench. The model is available on claude.ai, the Anthropic API, AWS Bedrock, Google Vertex AI, and Microsoft Foundry.

How does Claude Opus 4.7 compare to Opus 4.6?

Opus 4.7 posts a 13% relative improvement on coding benchmarks, triples image input resolution to 3.75 megapixels, and resolves three times more production tasks on Rakuten's internal SWE-Bench. Pricing stays identical at $5 per million input tokens and $25 per million output tokens, though a new tokenizer adds roughly 1.0 to 1.35x input token consumption per request.

How much does Claude Opus 4.7 cost?

Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens via the Anthropic API — unchanged from Opus 4.6. It is included at no extra cost in Claude Pro ($20/month), Claude Max ($100 or $200/month), Claude Team ($25 per seat), and Claude Enterprise plans.

Is Claude Opus 4.7 better than GPT-5.4 for coding?

On SWE-Bench Verified and Anthropic's own reported benchmarks, yes. Opus 4.7 scores 41.6% on SWE-Bench Verified, ahead of both GPT-5.4 and Gemini 3.1 Pro. Real-world deployments like Box's reported 56% fewer model calls and 24% faster responses after switching to Opus 4.7.

Where can I use Claude Opus 4.7?

Claude Opus 4.7 is available on claude.ai and Claude Code, the Anthropic API (model ID claude-opus-4-7), AWS Bedrock in us-east-1 and us-west-2, Google Vertex AI, and Microsoft Foundry. Anthropic shipped multi-cloud availability on launch day rather than staggering regional rollouts.

Claude Opus 4.7 Just Shipped. Devs Are Handing Off the Work They Couldn't Trust AI With Before.

What actually changed between 4.6 and 4.7

The benchmark numbers in context

The catch: tokenizer changes

The Anthropic Labs angle

Where Opus 4.7 is available

What to stop doing

What Opus 4.7 does not solve

The bigger picture

The real cost math for a 20-engineer team

How Opus 4.7 changes the agent architecture decision

Frequently Asked Questions

What is Claude Opus 4.7?

How does Claude Opus 4.7 compare to Opus 4.6?

How much does Claude Opus 4.7 cost?

Is Claude Opus 4.7 better than GPT-5.4 for coding?

Where can I use Claude Opus 4.7?

Key Takeaways

Related Resources

AI Tools Directory

Open-Source Repositories

Weekly AI Digest

Claude Opus 4.7 Just Shipped. Devs Are Handing Off the Work They Couldn't Trust AI With Before.

What actually changed between 4.6 and 4.7

The benchmark numbers in context

The catch: tokenizer changes

The Anthropic Labs angle

Where Opus 4.7 is available

What to stop doing

What Opus 4.7 does not solve

The bigger picture

The real cost math for a 20-engineer team

How Opus 4.7 changes the agent architecture decision

Frequently Asked Questions

What is Claude Opus 4.7?

How does Claude Opus 4.7 compare to Opus 4.6?

How much does Claude Opus 4.7 cost?

Is Claude Opus 4.7 better than GPT-5.4 for coding?

Where can I use Claude Opus 4.7?

You Might Also Like

Related AI Tools

Related Repositories

Related Agent Skills

Key Takeaways

Related Resources

AI Tools Directory

Open-Source Repositories

Weekly AI Digest