49.
Most "agent memory" benchmarks just test whether a chatbot remembers your preferences. That tells you almost noth…
Most "agent memory" benchmarks just test whether a chatbot remembers your preferences. That tells you almost nothing about real agents. So we built MemGym: memory evaluation for deep research, coding, and GUI agents, with a clean memory-iso