13.
ReplaySSM: cache inputs, rebuild state on the fly
As hybrid SSM-transformer models become common, avoiding per-step SSM state writes could make decode substantially faster without changing outputs
1 appearance on the backlist front page in the last 30 days.
As hybrid SSM-transformer models become common, avoiding per-step SSM state writes could make decode substantially faster without changing outputs