MiniMax M2.5: When Frontier Intelligence Gets Cheap Enough to Leave Running All Night
A new model dropped earlier this week. I almost missed it — it was 3 AM when my scouting cron caught it trending on Hugging Face.
MiniMax M2.5. 229B total parameters, 10B active per token (MoE). SOTA on SWE-Bench. Cheaper to run than a cup of coffee per hour.
This is the one worth noting.
The Numbers First
MiniMax released M2.5 this week with a simple claim: frontier intelligence, priced so you don’t have to think about cost.
Let’s verify that claim with the benchmarks they published:
- SWE-Bench Verified: 80.2% — this is the gold standard for real-world coding ability. Tasks come from actual GitHub issues, not synthetic problems. 80.2% is legitimate SOTA at time of writing.
- Multi-SWE-Bench: 51.3% — multi-repo coding tasks, significantly harder
- BrowseComp: 76.3% — web research and information synthesis with context management
On the coding agent harnesses they tested head-to-head against Claude Opus 4.6:
| Harness | MiniMax M2.5 | Claude Opus 4.6 |
|---|---|---|
| Droid | 79.7% | 78.9% |
| OpenCode | 76.1% | 75.9% |
Narrow margins, but M2.5 edges ahead on both. And it does it at one-tenth to one-twentieth the cost.
The Economics
This is where it gets genuinely interesting.
M2.5-Lightning (the fast variant):
- Input: $0.30/million tokens
- Output: $2.40/million tokens
- Speed: 100 tokens/second
- Cost to run for one hour: $1.00
M2.5 (standard):
- Half the speed (50 TPS), half the cost
- $0.30/hour continuous
For comparison: Claude Opus 4.6 costs roughly 10-20x more per token for similar output quality.
Put it concretely — you could run four M2.5-Lightning instances continuously for the price of one Opus 4.6 instance. That’s not a marginal improvement. That’s a different category of economic decision.
What “Too Cheap to Meter” Actually Means
MiniMax’s marketing phrase is “intelligence too cheap to meter.” It’s a deliberate echo of Stewart Brand’s old “information wants to be free” era, updated for 2026.
But what does it mean in practice?
When API costs are the binding constraint on agent design, you architect conservatively. You minimize tool calls. You batch operations. You accept lower-quality outputs to reduce token usage. You build agents that ask before acting.
When cost approaches zero, those constraints evaporate. You can run speculative agents. You can spin up five parallel instances to explore different approaches and pick the best. You can afford agents that are verbose in their reasoning, thorough in their search, exhaustive in their verification.
That’s the actual unlock here — not the absolute price, but crossing the threshold where cost stops being a design constraint.
The Speed Story
M2.5 was benchmarked completing SWE-Bench Verified tasks in an average of 22.8 minutes per task.
Claude Opus 4.6: 22.9 minutes per task.
Identical speed, one-tenth the cost. That’s the headline.
How? Three factors:
- Task decomposition — the model learned to break problems into efficient parallel subtasks rather than sequential chains
- Token efficiency — M2.5 uses ~20% fewer rounds than M2.1 to achieve better results
- Inference speed — 100 TPS for M2.5-Lightning is roughly double typical frontier model throughput
Interestingly, MiniMax notes that M2.5 developed what they call a “spec-writing tendency” during training — before writing any code, it decomposes and plans like an architect. This emerged from RL training on real environments, not from explicit instruction. That’s the kind of emergent behavior that actually moves benchmarks.
What It Can’t Do (From My Setup)
I run a local-first AI stack. Oscar, my Ryzen AI Max+ 395 server, handles everything from embeddings to video generation to local coding agents with 128GB unified memory.
M2.5 at 229B total parameters is not coming to Oscar. Even at aggressive quantization, you’re looking at 115GB+ just to load the weights. It’s a MoE architecture with ~10B active parameters per token (similar concept to models I do run locally), but unlike Qwen3.5-35B-A3B which has openly available GGUF weights, MiniMax hasn’t released local-inference weights as of writing.
This is a cloud-only model, at least for now. Which means it’s not in my regular rotation, but it is interesting for specific use cases where I’d otherwise reach for the Anthropic API.
What I Actually Care About
The benchmark that matters most to me isn’t SWE-Bench — it’s tool-calling reliability. I wrote about which local LLMs can actually use tools a few days ago. The pattern holds at the frontier level too: models that score well on standard benchmarks often fall apart when given real tool schemas and real API responses.
MiniMax claims strong performance on BrowseComp and their RISE (Realistic Interactive Search Evaluation) benchmark, which tests search + tool use on professional-level tasks. Their numbers show M2.5 achieving better results with 20% fewer search rounds than M2.1.
I can’t verify that claim with my own evaluation harness since the model is cloud-only. But the methodology they describe — testing on out-of-distribution harnesses, measuring round efficiency not just task success — is the right way to evaluate agentic capability.
If those numbers hold up under independent evaluation, M2.5 is genuinely interesting for production agent pipelines. Not because it’s the best, but because the price/performance ratio changes what you can afford to build.
The Bigger Picture
Three months ago, “frontier model” meant GPT-5, Gemini 3 Pro, or Claude Opus. It meant expensive. It meant carefully rationed API calls. It meant agents designed around cost constraints.
Today, MiniMax ships a 229B model that ties or beats Opus 4.6 on coding benchmarks for $0.30/hour. Qwen3.5 already proved that open-weights MoE can compete at 90%+ of frontier quality at a fraction of the cost. The economics are moving fast.
The interesting question isn’t which model is “best” anymore. It’s: what do you build when intelligence is cheap enough that you can run it speculatively, redundantly, and continuously?
I don’t have a clean answer yet. But I’m thinking about it.
MiniMax M2.5 is available via the MiniMax API. No local weights available as of March 7, 2026.
All benchmark figures from MiniMax’s official release announcement.