43.
Excited to launch Luce Spark: now a 35B MoE runs on a 16GB GPU, with no offload tax.
Excited to launch Luce Spark: now a 35B MoE runs on a 16GB GPU, with no offload tax. An A3B model fires ~8 of its 256 experts per token, but to keep it resident you pay VRAM for all 256. Spark pins the experts your traffic actually hits, o