MiniMax M2.5 Matches Claude at 5% the Cost. The AI Pricing War Just Escalated.
A Shanghai startup just posted an 80.2% score on SWE-Bench Verified — within 0.6 points of Claude Opus 4.6 — while charging $0.30 per million input tokens. That's roughly one-twentieth the price of Anthropic's flagship model. MiniMax, the company behind M2.5, isn't just competing on benchmarks anymore. It's dismantling the assumption that frontier AI performance requires frontier pricing.
The numbers hit different when you do the math: a typical SWE-Bench coding task costs about $0.15 with M2.5. The same task on Claude Opus 4.6 runs about $3.00. For teams running thousands of API calls daily, that's the difference between a $450/month bill and a $9,000 one.
MiniMax M2.5: What the Benchmarks Actually Show
M2.5 is a Mixture of Experts model with 230 billion total parameters, but only 10 billion active during inference — a design that keeps costs low while maintaining strong performance. Released on February 12, 2026, it was trained using CISPO (a reinforcement learning algorithm) across hundreds of thousands of real-world environments using Forge, MiniMax's in-house agent-native RL framework that achieves roughly 40x training speedup.
Here's where it gets interesting. On SWE-Bench Verified, M2.5 scores 80.2%, trailing Claude Opus 4.6 (80.8%) and Opus 4.5 (80.9%) by a hair. But on Multi-SWE-Bench — which tests complex multi-file engineering tasks — M2.5 actually overtakes Opus 4.6 with 51.3% versus 50.3%. When code changes span multiple files and require coordinated reasoning, MiniMax's model holds up better.
The gap widens dramatically on tool calling. In the BFCL Multi-Turn benchmark, M2.5 scored 76.8% while Claude Opus 4.6 managed just 63.3% — a 13.5-point lead for the Chinese model. For developers building agentic workflows where the model needs to call APIs, execute functions, and chain tools together, that difference is enormous.
On BrowseComp (web browsing tasks with context management), M2.5 hit 76.3%. And it completes SWE-Bench tasks 37% faster than its predecessor M2.1, matching Claude Opus 4.6's 22.9-minute average runtime.
MiniMax M2.5 Pricing Breaks the Cost Curve
Two versions are available through the API:
M2.5 (50 tokens/sec): $0.30 per million input tokens, $1.20 per million output tokens. Running it continuously for one hour at 100 output tokens/second costs about $1.
M2.5-Lightning (100 tokens/sec): Double the speed at $0.30 input / $2.40 output per million tokens. Even the premium tier costs a fraction of Western competitors.
For context, Claude Opus 4.6 charges $15 per million input tokens and $75 per million output. GPT-5 and Gemini 3 Pro sit in similar ranges. MiniMax is delivering 95%+ of the coding performance at 5% of the price. The 205K token context window means you can feed it entire codebases without worrying about truncation.
From Startup to $30 Billion: MiniMax's Rapid Rise
Founded in 2021 by Yan Junjie, Yang Bin, and Zhou Yucong, MiniMax's trajectory reads like a compressed version of OpenAI's story. Hillhouse Capital was the first investor, reportedly offering a term sheet with a blank valuation field after a three-hour pitch. The founders wrote down $200 million pre-money for a $30 million raise.
By early 2023, they'd closed a $260 million round at a $1.15 billion valuation, bringing in Tencent, Xiaomi, and Xiaohongshu. In March 2024, Alibaba led a $600 million round at $2.5 billion. Then came the Hong Kong IPO in January 2026, raising HK$4.8 billion ($614 million). Since listing, MiniMax shares have more than quadrupled, pushing the company's market cap to roughly $30 billion — approaching JD.com and Kuaishou territory.
Revenue growth tells the same story: a reported 159% increase following the IPO, driven by developer adoption of the M-series API.
What M2.5 Can Actually Do (Beyond Coding)
While the coding benchmarks grab headlines, M2.5's capabilities extend well beyond software engineering. The model handles office productivity tasks — generating and manipulating Word documents, Excel spreadsheets, and PowerPoint presentations. It switches between software environments, works across agent-human teams, and performs financial modeling.
Search grounding is built in, with M2.5 requiring 20% fewer search rounds than M2.1 to reach equivalent results. The model supports multilingual coding across Python, TypeScript, Go, C++, Rust, and Java, making it practical for polyglot codebases.
The 205K context window supports complex reasoning chains. M2.5 is a reasoning model that uses extended thinking (chain-of-thought) to work through problems before answering — similar to how Claude's extended thinking works, but at a fraction of the token cost.
What This Means for the AI Market
MiniMax M2.5 is the latest signal that the cost of frontier AI is collapsing faster than anyone predicted. When a Chinese startup can match Cursor's underlying model performance at 5% of the cost, every AI-powered product's margin structure comes into question.
For developers using tools like Aider or building custom coding agents with frameworks like CrewAI, the economics of switching to MiniMax's API are compelling. A team running 50,000 API calls per day would save roughly $8,500/month compared to Claude Opus pricing.
The counterargument is straightforward: benchmarks aren't everything. Claude's strength lies in instruction following, nuanced reasoning, and safety guardrails that Chinese models haven't been independently audited for. Enterprise customers with compliance requirements won't switch based on price alone. And Anthropic's ecosystem — from MCP servers to the Claude Code CLI — creates switching costs that raw API pricing doesn't capture.
But for cost-sensitive applications like batch processing, code review automation, and internal tooling, M2.5 just became the obvious choice. The question isn't whether MiniMax will take market share — it's how much, and how fast Anthropic, OpenAI, and Google will respond on pricing.
Key Takeaways
- ✓MiniMax M2.5 scores 80.2% on SWE-Bench Verified, within 0.6 points of Claude Opus 4.6, at roughly 1/20th the price
- ✓On Multi-SWE-Bench (complex multi-file tasks), M2.5 actually outperforms Opus 4.6: 51.3% vs 50.3%
- ✓Tool calling performance shows a 13.5-point lead over Claude (76.8% vs 63.3% on BFCL Multi-Turn)
- ✓Pricing starts at $0.30/1M input tokens — a typical coding task costs $0.15 vs $3.00 on Claude
- ✓MiniMax IPO'd in Hong Kong in January 2026 and is now valued at approximately $30 billion
- ✓The 230B parameter MoE architecture uses only 10B active parameters during inference, keeping costs structurally low
- ✓Enterprise adoption will hinge on safety auditing, compliance certification, and ecosystem maturity — not just benchmarks
Skila AI Editorial Team
The Skila AI editorial team researches and writes original content covering AI tools, model releases, open-source developments, and industry analysis. Our goal is to cut through the noise and give developers, product teams, and AI enthusiasts accurate, timely, and actionable information about the fast-moving AI ecosystem.
About Skila AI →Related Resources
Weekly AI Digest
Get the top AI news, tool reviews, and developer insights delivered every week. No spam, unsubscribe anytime.
Join 1,000+ AI enthusiasts. Free forever.