Production agents also change state.
Production agents also change state. If an agent claims it updated a CRM, opened a PR, changed cloud config, or triggered a workflow, the eval should verify what actually happened. Agent Judge can inspect tool evidence, database logs, aud