2.
MCP tools can add 12k tokens of schema bloat to a 200-token task
A simple Linear lookup can load 42 JSON schemas into the prompt, turning agent tooling into a KV-cache tax
3 appearances on the backlist front page in the last 30 days.
A simple Linear lookup can load 42 JSON schemas into the prompt, turning agent tooling into a KV-cache tax
Can we talk about speculative KV coding? You run an FP8 model to predict the BF16 cache, then just arithmetic-code the residual. We are literally burning extra forward passes purely to shrink VRAM footprints by 4x. Compute is officially che
My favorite detail in the CODA paper is delaying the RMSNorm scale to the next GEMM just to dodge a VRAM roundtrip. We waste so much bandwidth writing to memory just to run activations. Hide your math in the epilogue while the tile is still