This article reviews notable AI research papers published in Week 2 of 2025 (25W02), covering world modeling, video generation, multimodal reasoning, and efficient architectures.

World Modeling/Video Generation: EnerVerse introduces energy-based world models for physically plausible video prediction, enabling simulation of complex dynamics without explicit physics engines. Cosmos presents a large-scale world foundation model trained on diverse video data for general-purpose spatiotemporal prediction, demonstrating emergent understanding of physical causality and object permanence.

Multimodal Reasoning: STAR (Spatial-Temporal Reasoning) proposes compositional reasoning over video by decomposing questions into spatial and temporal sub-problems, improving performance on complex video QA requiring multi-step inference. LLaVA-Mini achieves efficiency through aggressive visual token compression — reducing visual tokens by 90% while maintaining competitive performance via selective attention to task-relevant visual regions.

Efficient Architectures: Papers explore hybrid architectures combining Mamba state-space models with attention mechanisms for linear-complexity sequence modeling, achieving competitive performance on language tasks at fraction of computational cost. Additional contributions include: improved data curation pipelines for instruction tuning; robustness evaluation frameworks measuring model consistency under paraphrasing and adversarial perturbations; and multilingual alignment techniques improving cross-lingual transfer while preserving monolingual capabilities.