61.
Today, we’re sharing a new state of the art for computer use.
Today, we’re sharing a new state of the art for computer use. Our system holds the two highest verified scores on OSWorld, the standard benchmark for AI agents that operate a computer like a person: 83.6% using Claude Opus 4.7 and 81.5% u