@jsuarez on Backlist

68.

Reward eng should be the last resort in RL. Curriculum + simple reward

by @jsuarez (Joseph Suarez ) · backlist 2026-06-12 · rubric 84.0

68.

You can train drones in 30 seconds. This was a year ago. PufferLib is 5x faster now!

by @jsuarez (Joseph Suarez ) · backlist 2026-06-09 · rubric 72.0

42.

Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, i…

Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, including our client envs. Turns out mazes and 2048 without exploiting domain knowledge are just harder than many real world pro

by @jsuarez (Joseph Suarez ) · backlist 2026-05-28 · rubric 86.0

30.

Reinforcement learning research with Joseph Suarez

by @jsuarez (Joseph Suarez ) · backlist 2026-05-25 · rubric 96.0

55.

This week, I solved a problem in RL involving ludicrous sparsity that I have been thinking about since 2018. Init…

This week, I solved a problem in RL involving ludicrous sparsity that I have been thinking about since 2018. Initial sweeps are showing SOTA on one of our most consistently informative test envs. Blog post soon. For now, you can follow the

by @jsuarez (Joseph Suarez ) · backlist 2026-05-22 · rubric 90.0