61.
The bottleneck in LLM inference isn't compute. It's how fast you can move the weights. (x.com)
The bottleneck in LLM inference isn't compute. It's how fast you can move the weights. Our CTO Mathias Lechner, @mlech26l , joins Piotr Mazurek, @tugot17 , from our inference team, to discuss what actually limits token throughput and how