2.
A Transformer with KV cache implemented in RTL on a Virtex-5 FPGA (x.com)
A full microGPT-style Transformer ran at 56k+ tokens/sec on FPGA fabric with no CPU or GPU
2 appearances on the backlist front page in the last 30 days.
A full microGPT-style Transformer ran at 56k+ tokens/sec on FPGA fabric with no CPU or GPU
No GPU, no CPU - a full Transformer with KV cache as RTL on a Virtex-5 FPGA. microGPT at ~56k tokens/s, fully open-source. Thought you'd appreciate this, @reach_vb