Microsoft Just Built Its Own Coding AI to Stop Paying OpenAI
On June 2 at Build 2026, Microsoft did something it has avoided for years. It stopped renting its coding intelligence and started building its own.
The company shipped two models trained end-to-end in-house: MAI-Code-1-Flash, a small coding model, and MAI-Thinking-1, its first flagship reasoning model. Both came out of Mustafa Suleyman's Microsoft AI group. Both were trained on Microsoft's own data, not OpenAI's weights.
And MAI-Code-1-Flash is already inside the editor you probably opened this morning.
The benchmark that should worry Anthropic
Microsoft pointed MAI-Code-1-Flash directly at Claude Haiku 4.5, the cheap, fast model a lot of teams default to for autocomplete and quick edits.
The headline number: 51.2% on SWE-Bench Pro versus 35.2% for Claude Haiku 4.5. That is a 16-point lead on real-world, multi-file engineering tasks, not toy puzzles.
Microsoft says MAI-Code-1-Flash beat Haiku 4.5 on all four core coding benchmarks it tested, with a higher pass rate on every single one. The model is roughly 5 billion parameters. Haiku is not a pushover. Getting beaten on every eval by something that small is the kind of result that gets screenshotted in engineering Slacks.
Then there is the part that actually hits your invoice. MAI-Code-1-Flash solves harder problems with up to 60% fewer tokens. Microsoft built in "adaptive solution length control" so the model stops padding answers. Fewer tokens means lower cost per task. For a tool billed by usage, that compounds fast across a team.
MAI-Thinking-1 goes after the big models
MAI-Code-1-Flash is the everyday workhorse. MAI-Thinking-1 is the flex.
It is a mid-sized reasoning model: 35 billion active parameters, a 128K context window, and — notably — trained from scratch with no distillation from another model. Microsoft used clean, commercially licensed, enterprise-grade data. That last detail matters for enterprise legal teams who have spent two years nervous about where training data came from.
On performance, Microsoft says independent raters preferred MAI-Thinking-1 over Anthropic's Claude Sonnet 4.6 in blind testing. On coding specifically, it matches Claude Opus 4.6 on SWE-Bench Pro. Opus is Anthropic's top-tier model. Matching it with a 35B reasoning model is a serious claim.
MAI-Thinking-1 is in private preview through Microsoft Foundry. MAI-Code-1-Flash is the one already shipping to real users.
Why this is really a story about OpenAI
Microsoft has poured tens of billions into OpenAI. For most of that time, the AI inside Copilot, Office, and Windows leaned on OpenAI's models. That was fine when nobody else could compete. It is expensive and strategically fragile now that everybody can.
Owning the model changes the math three ways.
First, cost. Microsoft pays itself instead of paying a partner per token. At Copilot's scale, that is a budget line worth billions.
Second, control. Microsoft can tune a model specifically for the GitHub Copilot harnesses it runs in production. MAI-Code-1-Flash was built against those exact harnesses. A general-purpose model from a vendor never gets that tight.
Third, leverage. Every model Microsoft ships in-house weakens its dependence on any single supplier. CNBC framed the June 2 launch exactly this way: Microsoft cutting its reliance on OpenAI and lowering costs for developers. The models are the proof.
What it means for the tool you actually code with
Here is where it gets personal. MAI-Code-1-Flash is rolling out free to GitHub Copilot individual users in VS Code — both in the model picker and under the default auto picker. You may end up using it without choosing it.
If you pay for an AI coding tool, this is the squeeze. The coding-assistant market splits roughly into two camps: the agentic, terminal-first tools and the in-editor assistants.
On the agentic side, you have Claude Code and OpenAI's Codex. On the in-editor side, Cursor, Windsurf, and Copilot itself. Microsoft just made the in-editor option cheaper to run and, on its own benchmarks, more accurate than a leading budget model. When the default tier gets that good, the premium tiers have to justify their price.
That does not mean Cursor or Claude Code are in trouble. They win on agentic workflows, large-context refactors, and raw frontier reasoning — areas a 5B model is not built for. But the floor just moved up. "Good enough" autocomplete is now genuinely good and effectively free.
If you want to see how the agentic tools stack up against this new baseline, our roundup of the best AI coding tools breaks down where each one still earns its subscription.
The price war nobody can stop now
Microsoft is not alone. Google has been pushing its own coding models, and the whole sector is racing to drive per-task cost toward zero. The losers in a price war are the vendors with no moat beyond "our model is slightly better." The winners are developers.
Cheaper, faster, more accurate coding AI baked into the default editor is good news if you write code. It is uncomfortable news if you sell a coding tool that competes on the things a free, fast model now does well.
The interesting question for the next six months is not whether MAI-Code-1-Flash is good. Microsoft's numbers say it is. The question is how much further the floor drops once Microsoft owns the whole stack — model, editor, cloud, and distribution.
Want to go deeper on the protocol layer that lets all these models plug into your tools? Start with our explainer on what MCP is and why it matters.
Small models are the real shift
For two years the narrative was "bigger is better." More parameters, more data, more frontier capability. MAI-Code-1-Flash quietly argues the opposite for one specific job: everyday coding.
A 5B-parameter model beating a leading budget model on every coding benchmark Microsoft tested is a statement about specialization. You do not need a 400B general-purpose brain to fix a type error or write a unit test. You need a small model trained hard on exactly that, running in exactly the harness where the work happens.
That is the design choice Microsoft leaned into. MAI-Code-1-Flash was built directly against the GitHub Copilot harnesses used in production, not as a research artifact someone later wired into an editor. The "adaptive solution length control" — the feature behind the up-to-60%-fewer-tokens claim — exists because Microsoft knows precisely how Copilot calls the model and where the waste was.
Smaller also means faster and cheaper to serve. A model Microsoft can run at low cost is a model Microsoft can give away in the free Copilot tier without bleeding money. That is the strategy: make the good-enough tier so cheap to operate that price stops being a reason to choose a competitor.
The numbers, in one place
If you skipped to here, the verified figures from Microsoft's June 2 announcement:
- MAI-Code-1-Flash: ~5B parameters. 51.2% on SWE-Bench Pro vs 35.2% for Claude Haiku 4.5. Higher pass rate on all four core coding benchmarks tested. Up to 60% fewer tokens on hard problems. Rolling out free to GitHub Copilot individual users in VS Code.
- MAI-Thinking-1: 35B active parameters, 128K context window. Trained from scratch, no distillation, on commercially licensed enterprise data. Preferred over Claude Sonnet 4.6 in blind testing. Matches Claude Opus 4.6 on SWE-Bench Pro. Private preview via Microsoft Foundry.
Two caveats worth keeping in mind. These are Microsoft's own benchmarks, run by Microsoft. Independent SWE-Bench Pro results from third parties have not landed yet, and vendor benchmarks always flatter the vendor. Treat the 16-point lead as a strong signal, not gospel, until someone outside Redmond reproduces it.
What you should actually do this week
If you use GitHub Copilot in VS Code, you do not have to do anything — MAI-Code-1-Flash will show up in your model picker and may run under the default auto picker. Try it on a real task and compare it to whatever you were using. The token efficiency alone may change which model you pin.
If you pay for a premium coding tool, do not cancel anything yet. Run a side-by-side on the work you actually do. If your day is mostly autocomplete and small edits, a free model that scores this well might genuinely cover it. If you lean on agentic refactors, large-context reasoning, or multi-file planning, the premium tools still earn their keep — for now.
If you build developer tools for a living, this is the wake-up call. "Slightly better autocomplete" is no longer a business. The defensible ground is agentic workflows, deep context, integrations, and the parts of the job a 5B model cannot touch. Microsoft just commoditized the easy 80%.
Frequently Asked Questions
What is MAI-Code-1-Flash?
MAI-Code-1-Flash is a roughly 5-billion-parameter coding model Microsoft launched at Build 2026 on June 2. It is built end-to-end by Microsoft for fast, efficient code assistance and is rolling out free to GitHub Copilot individual users inside Visual Studio Code.
How does MAI-Code-1-Flash compare to Claude Haiku 4.5?
On Microsoft's benchmarks, MAI-Code-1-Flash scores 51.2% on SWE-Bench Pro versus 35.2% for Claude Haiku 4.5 — a 16-point lead — and outperforms Haiku on all four core coding benchmarks tested. It also solves harder problems with up to 60% fewer tokens, which lowers cost per task.
What is MAI-Thinking-1?
MAI-Thinking-1 is Microsoft's first flagship reasoning model: 35 billion active parameters, a 128K context window, and trained from scratch with no distillation. Microsoft says it matches Claude Opus 4.6 on SWE-Bench Pro coding tasks and was preferred over Claude Sonnet 4.6 in blind testing. It is in private preview through Microsoft Foundry.
Why is Microsoft building its own coding models?
Microsoft wants to cut its dependence on OpenAI, lower costs for developers, and tune models specifically for the GitHub Copilot systems it runs in production. Owning the model means Microsoft pays itself instead of a partner and controls the full coding-tool stack.
Does MAI-Code-1-Flash replace Claude Code or Cursor?
Not directly. MAI-Code-1-Flash targets fast in-editor assistance, while agentic tools like Claude Code and Cursor still lead on large-context refactors and frontier reasoning. The launch raises the free baseline, which pressures premium tools to justify their pricing rather than replacing them outright.
Key Takeaways
- ✓Microsoft launched MAI-Code-1-Flash and MAI-Thinking-1 at Build 2026 on June 2, its first in-house coding-focused models.
- ✓MAI-Code-1-Flash, a 5B-parameter model, posts 51.2% on SWE-Bench Pro versus 35.2% for Claude Haiku 4.5 — a 16-point lead — while using up to 60% fewer tokens.
- ✓MAI-Thinking-1 has 35B active parameters and a 128K context window, was trained from scratch with no distillation, and matches Claude Opus 4.6 on SWE-Bench Pro coding tasks.
- ✓MAI-Code-1-Flash is rolling out free to GitHub Copilot individual users inside VS Code, in the model picker and the default auto picker.
- ✓The move tightens Microsoft's grip on the coding-tool stack and reduces its dependence on OpenAI's models for everyday developer workflows.
Skila AI Editorial Team
The Skila AI editorial team researches and writes original content covering AI tools, model releases, open-source developments, and industry analysis. Our goal is to cut through the noise and give developers, product teams, and AI enthusiasts accurate, timely, and actionable information about the fast-moving AI ecosystem.
About Skila AI →Related Resources
Weekly AI Digest
Get the top AI news, tool reviews, and developer insights delivered every week. No spam, unsubscribe anytime.
Join 1,000+ AI enthusiasts. Free forever.