34.
standard softmax attention takes a convex combination of values in context, but parallax lets you extrapolate bey…
standard softmax attention takes a convex combination of values in context, but parallax lets you extrapolate beyond them. e.g. if you have key-value pairs (1, 1), (2, 2), (3, 3) and a query q=4, softmax attention will output something li