@ArizePhoenix on Backlist

1 appearance on the backlist front page in the last 30 days.

88.

(x.com)

"Don't trust. Evaluate." @nearestnabors set out to replace Claude Sonnet with Gemma 4. The evals showed a quantifiably better option. Full walkthrough: capability evals + prompt engineering to ship a local 3B that matches Sonnet, 2x fas

by (arize-phoenix) · backlist 2026-05-26 · rubric 88.0