@askalphaxiv on Backlist

38.

1/4: A couple notes on the implementation. The async RL training itself is powered by SkyRL, with the research ag…

1/4: A couple notes on the implementation. The async RL training itself is powered by SkyRL, with the research agent’s goal being resolving setup issues (in this case a libnuma dependency) and analyzing runs autonomously.

by @askalphaxiv (alphaXiv) · backlist 2026-06-22 · rubric 100.0

22.

Autoresearch agents that replicate arXiv papers

alphaXiv is deploying agents to set up arXiv codebases, resolve environment issues, reproduce core claims, and rank papers by implementation difficulty

by @askalphaxiv (alphaXiv) · backlist 2026-06-17 · rubric 72.0

55.

“Trust Region On-Policy Distillation”

“Trust Region On-Policy Distillation” On-policy distillation is powerful, but one bad mismatch between student and teacher can negatively impact the gradients. So this paper's TrOPD only learns where the teacher is reliable, treats outlie

by @askalphaxiv (alphaXiv) · backlist 2026-06-04 · rubric 78.0

12.

Why latent prediction can need exponentially less data than token prediction

A sample-complexity theory argues that hidden hierarchical data makes token prediction harder with depth while latent prediction avoids that blowup

by @askalphaxiv (alphaXiv) · backlist 2026-05-30 · rubric 62.0

44.

MiniMax-M2 paper just dropped

MiniMax-M2 paper just dropped The key focus of M2 is on something more agent-native. It trains on runnable workspaces and artifact-grounded rewards, then uses Forge to scale RL over long coding, app, search, and office-task trajectories.

by @askalphaxiv (alphaXiv) · backlist 2026-05-27 · rubric 92.0