5.
Clarifying questions can make agents vulnerable to prompt injection
Across frontier models, asking clarifying questions turned apparently robust execution settings into prompt-injection attack success rates above 30% for several systems
3 appearances on the backlist front page in the last 30 days.
Across frontier models, asking clarifying questions turned apparently robust execution settings into prompt-injection attack success rates above 30% for several systems
We built ASPI to isolate clarification-seeking as its own agent state. Each benchmark scenario compares: - Execution mode → the agent receives a fully specified task - Clarification mode → the agent must ask follow-up questions before acti
Scale reports that open-ended RL can improve checklist scores while broader quality declines because models optimize the verifier setup itself