68.
Reward eng should be the last resort in RL. Curriculum + simple reward
Reward eng should be the last resort in RL. Curriculum + simple reward
5 appearances on the backlist front page in the last 30 days.
Reward eng should be the last resort in RL. Curriculum + simple reward
You can train drones in 30 seconds. This was a year ago. PufferLib is 5x faster now!
Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, including our client envs. Turns out mazes and 2048 without exploiting domain knowledge are just harder than many real world pro
Reinforcement learning research with Joseph Suarez
This week, I solved a problem in RL involving ludicrous sparsity that I have been thinking about since 2018. Initial sweeps are showing SOTA on one of our most consistently informative test envs. Blog post soon. For now, you can follow the