This article reviews notable AI research papers published in Week 2 of 2025 (25W02), covering world modeling, video generation, multimodal reasoning, and efficient architectures.
World Modeling/Video Generation: EnerVerse introduces energy-based world models for physically plausible video prediction, enabling simulation of complex dynamics without explicit physics engines. Cosmos presents a large-scale world foundation model trained on diverse video data for general-purpose spatiotemporal prediction, demonstrating emergent understanding of physical causality and object permanence.
Multimodal Reasoning: STAR (Spatial-Temporal Reasoning) proposes compositional reasoning over video by decomposing questions into spatial and temporal sub-problems, improving performance on complex video QA requiring multi-step inference. LLaVA-Mini achieves efficiency through aggressive visual token compression — reducing visual tokens by 90% while maintaining competitive performance via selective attention to task-relevant visual regions.
Efficient Architectures: Papers explore hybrid architectures combining Mamba state-space models with attention mechanisms for linear-complexity sequence modeling, achieving competitive performance on language tasks at fraction of computational cost. Additional contributions include: improved data curation pipelines for instruction tuning; robustness evaluation frameworks measuring model consistency under paraphrasing and adversarial perturbations; and multilingual alignment techniques improving cross-lingual transfer while preserving monolingual capabilities.
![[25W02] Latest AI Paper Tech Trends (EnerVerse, Cosmos, STAR, LLaVA-Mini, The GAN Is Dead)](https://metax-images-bucket.s3.ap-southeast-2.amazonaws.com/articles/25w02-ai-enerverse-cosmos-star-llava-mini-the-gan-is-dead-reinforce-search-o1-me-1065603266028663/img-1.webp)