This article reviews notable AI research papers published in Week 49 of 2024 (24W49), covering multimodal reasoning, GUI agents, video generation, and model evaluation.

Reasoning/Planning: GRAPE (Generalizing Robot Action Prediction via Enhanced) improves robot manipulation generalization through contrastive learning over action representations, enabling robust transfer to novel object configurations. CaM (Chain-of-Memory) enhances LLM long-context reasoning by maintaining explicit working memory across reasoning steps, improving performance on multi-hop QA requiring information integration over long documents. VGoT (Visual Graph-of-Thought) structures visual reasoning as graph traversal over scene elements, enabling systematic compositional reasoning about spatial relationships and object attributes.

GUI/Embodied Agents: Aguvis introduces autonomous GUI interaction through vision-language grounding, enabling zero-shot task completion on web interfaces and desktop applications without task-specific training. SNOOP (Semantic Novelty-Oriented Observation and Planning) improves open-world exploration agents through curiosity-driven observation selection balanced with goal-directed planning. X-Prompt advances prompt engineering for large multimodal models through systematic exploration of prompting strategies across diverse visual reasoning tasks.

Generation/Evaluation: Multiple video generation papers advance temporal consistency, motion quality, and controllability through improved diffusion architectures. Evaluation contributions include benchmarks measuring reasoning chain quality, tool use capabilities, and factual consistency of generated content across diverse domains and task types.