66.
Do you hear the people sing? Frontier models clearly do not, but hallucinate that they do.
Do you hear the people sing? Frontier models clearly do not, but hallucinate that they do. We found that, surprisingly, leading omni-modality foundation models are terrible at understanding the audio track of videos, and takes the shortcut