@danielrjiang on Backlist

50.

In RL, the ability to reset to an arbitrary state is powerful (see, e.g., Go-Explore), but often unrealistic. (x.com)

In RL, the ability to *reset* to an arbitrary state is powerful (see, e.g., Go-Explore), but often unrealistic. For LLMs though, states are tokens, so resets are natural! In work led by @Ankur_Samanta_ , we propose a GRPO variant where

by @danielrjiang (Daniel Jiang) · backlist 2026-06-23 · rubric 100.0

@danielrjiang on Backlist

In RL, the ability to *reset* to an arbitrary state is powerful (see, e.g., Go-Explore), but often unrealistic. (x.com)

In RL, the ability to reset to an arbitrary state is powerful (see, e.g., Go-Explore), but often unrealistic. (x.com)