47.
What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-…
What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-native? We name this the Tool Prior Paradox — and tame it with PARA-GRPO. Introducing ParaVT: parallel video tool use × agen