@KunhaoZ on Backlist

2 appearances on the backlist front page in the last 30 days.

49.

More than a year ago (x.com)

More than a year ago @TacoCohen in SPO https:// arxiv.org/abs/2503.05453 already derives a critic-free value/Q parameterization from the policy-reference log-ratio under KL-regularized RL by exactly “deriving value loss through policy ra

by @KunhaoZ (Kunhao Zheng) · backlist 2026-06-20 · rubric 78.0

45.

More than a year ago (x.com)

by @KunhaoZ (Kunhao Zheng) · backlist 2026-06-19 · rubric 78.0