@QPHutu on Backlist

72.

PPO is always my favorite RL algorithm, from game to LLM era (t.co)

PPO is always my favorite RL algorithm, from game to LLM era DAPO identified a critical issue with PPO’s ratio clipping. However, I don’t think the clip_higher solution addresses the root cause. Our DPPO work ( http:// arxiv.org/pdf/2602.0

by @QPHutu (Penghui Qi) · backlist 2026-06-18 · rubric 72.0