Friday, January 16, 2026

NarrativeTrack: Evaluating Video Language Fashions Past the Body


Multimodal massive language fashions (MLLMs) have achieved spectacular progress in vision-language reasoning, but their potential to know temporally unfolding narratives in movies stays underexplored. True narrative understanding requires grounding who’s doing what, when, and the place, sustaining coherent entity representations throughout dynamic visible and temporal contexts. We introduce NarrativeTrack, the primary benchmark to judge narrative understanding in MLLMs via fine-grained entity-centric reasoning. Not like present benchmarks restricted to quick clips or coarse scene-level semantics, we decompose movies into constituent entities and study their continuity through a Compositional Reasoning Development (CRP), a structured analysis framework that progressively will increase narrative complexity throughout three dimensions: entity existence, entity adjustments, and entity ambiguity. CRP challenges fashions to advance from temporal persistence to contextual evolution and fine-grained perceptual reasoning. A completely automated entity-centric pipeline permits scalable extraction of temporally grounded entity representations, offering the inspiration for CRP. Evaluations of state-of-the-art MLLMs reveal that fashions fail to robustly monitor entities throughout visible transitions and temporal dynamics, typically hallucinating identification beneath context shifts. Open-source general-purpose MLLMs exhibit sturdy perceptual grounding however weak temporal coherence, whereas video-specific MLLMs seize temporal context but hallucinate entity’s contexts. These findings uncover a basic trade-off between perceptual grounding and temporal reasoning, indicating that narrative understanding emerges solely from their integration. NarrativeTrack offers the primary systematic framework to diagnose and advance temporally grounded narrative comprehension in MLLMs.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles