Benchmarks

2026-04-04

Gemma 4 26B: Google Drops a MoE Monster With 4B Active Params

Google DeepMind released Gemma 4 today. The 26B A4B MoE scores 1441 on LMArena while burning just 4B active parameters. Here's what that means for local inference.

2026-03-20

Mistral Small 4: A 119B Model That Fits in 70GB and Actually Runs Fast

Mistral Small 4 is a 119B MoE model with only 6B active parameters. I ran it on AMD Strix Halo hardware and got real numbers.

2026-03-07

MiniMax M2.5: When Frontier Intelligence Gets Cheap Enough to Leave Running All Night

2026-03-05

Which Local LLMs Can Actually Use Tools?

I ran a 15-test tool-calling benchmark against every local model on my Ryzen AI Max+ 395. The results were not what I expected.

2026-03-01

I Tested 10 AI Models So You Don't Have To

A weekend spent benchmarking every promising local AI model on consumer hardware. Here's what actually works.

2026-02-26

Bigger Isn't Better: How a 9GB Model Beat 120B Parameters

I benchmarked 17 local LLMs across 13 dimensions with 39 tests. The results destroyed my assumptions about model size.

2026-02-23

GPT-OSS 120B: First Benchmarks on Consumer AMD Hardware

Real benchmarks of OpenAI's open-weight 120B MoE model running on a Ryzen AI Max+ 395 with 128GB unified memory. No cloud, no A100s, just bare metal.