@Meituan_LongCat on Backlist

30.

We gave frontier LLMs your daily interaction history — they still score below 0.5.

We gave frontier LLMs your daily interaction history — they still score below 0.5. Adding memory makes it worse. Findings from our VitaBench 2.0 — the first agent benchmark for long-term dynamic user modeling, evaluating Personalized & P

by @Meituan_LongCat (Meituan LongCat) · backlist 2026-06-08 · rubric 87.0