This article reviews notable AI research papers published in Week 47 of 2024 (24W47), covering vision-language reasoning, world modeling, efficient mobile AI, and open datasets.
Reasoning/Planning: LLaVA-o1 enables autonomous multi-step reasoning in vision-language models through sequential stages of summarization, visual interpretation, logical reasoning, and conclusion — achieving results surpassing Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct on multimodal benchmarks using inference-time stage-level beam search. Generative World Explorer (Genex) enables embodied agents to mentally explore large 3D worlds (e.g., urban environments) by generating imagined future observations, updating beliefs through imagination and improving long-horizon planning in partially-observable settings.
Efficient Mobile AI: BlueLM-V-3B achieves on-device MLLM deployment through algorithm-system co-design — redesigning dynamic resolution handling and implementing hardware-aware deployment optimizations. With 2.7B language model and 400M vision encoder parameters, achieves 24.4 token/s on MediaTek Dimensity 9300 and highest average score (66.1) on OpenCompass among sub-4B parameter models.
Open Resources: RedPajama releases the open-source reproduction of LLaMA training datasets, providing transparency into data curation pipelines and enabling reproducible research. The dataset has been utilized in real-world models including Snowflake Arctic and XGen. SageAttention2 achieves 3x attention computation speedup through 4-bit quantization of attention matrices while maintaining model quality on downstream tasks.
![[24W47] Latest AI Paper Tech Trends (LLaVA-o1, Generative World Explorer, BlueLM-V)](https://metax-images-bucket.s3.ap-southeast-2.amazonaws.com/articles/24w47-ai-llava-o1-generative-world-explorer-bluelm-v-3b-redpajama-sageattention2-1065600108581207/img-1.webp)