@mohitwt_ on Backlist

Building a GPT-2 inference engine from scratch in CUDA

Implementing transformer inference directly in CUDA exposes the real performance work behind reductions, memory behavior, numerical stability, and kernel design

by @mohitwt_ (mohit) · backlist 2026-05-10 · rubric 98.0