@pupposandro on Backlist

43.

Excited to launch Luce Spark: now a 35B MoE runs on a 16GB GPU, with no offload tax.

Excited to launch Luce Spark: now a 35B MoE runs on a 16GB GPU, with no offload tax. An A3B model fires ~8 of its 256 experts per token, but to keep it resident you pay VRAM for all 256. Spark pins the experts your traffic actually hits, o

by @pupposandro (Sandro) · backlist 2026-06-08 · rubric 81.0