@che_shr_cat on Backlist

26.

Topology-aware operators for physical ML

Encoding geometry into the operator itself can make physical ML both faster and more accurate

by @che_shr_cat (Grigory Sapunov) · backlist 2026-06-20 · rubric 100.0

8.

Training nonlinear RNNs with flat, parallelized O(1) gradients

The proposed method attacks BPTT’s sequential, unstable O(T) gradient path and reframes how expressive RNNs can be trained

by @che_shr_cat (Grigory Sapunov) · backlist 2026-06-16 · rubric 82.0

1.

Programming neural network weights with natural language

Benign training text can now steer a model’s internal weights to carry a functional hidden artifact, blurring data curation and model supply-chain security

by @che_shr_cat (Grigory Sapunov) · backlist 2026-06-14 · rubric 78.0

11.

Do transformers need separate Q, K and V projections?

Merging Q, K, and V projections challenges a core transformer assumption and could reduce memory pressure in long-context models

by @che_shr_cat (Grigory Sapunov) · backlist 2026-06-10 · rubric 82.0

15.

1/

1/ We have spent years optimizing KV cache via head-sharing (GQA/MQA), but we ignored a fundamental assumption: why do Transformers need three separate Q, K, and V projections in the first place? Turns out, they don't. Merging them unlocks

by @che_shr_cat (Grigory Sapunov) · backlist 2026-06-09 · rubric 82.0