This article reviews notable AI research papers published in the first week of 2025, covering medical/healthcare AI, computer vision/multimodal AI, model optimization, AI systems/frameworks, and 3D/4D generation.

Medical/Healthcare AI: HuatuoGPT-o1 proposes a two-stage approach using validated medical datasets and reinforcement learning to improve complex medical reasoning in LLMs — achieving superior performance with only 40,000 validated problems. Medical Imaging MLLM introduces combinatorial generalization (CG) to analyze relationships between modality, anatomical region, and task for effective performance in limited data environments.

Computer Vision/Multimodal: VideoRefer Suite enhances spatiotemporal object understanding in video LLMs. VideoAnydoor develops video object insertion with precise motion control. 2.5 Years in Class proposes a new VLM training approach using educational videos. Explanatory Instructions introduces explanatory instruction concepts for zero-shot generalization in computer vision.

Model Optimization: 1.58-bit FLUX quantizes text-to-image generation models to 1.58-bit weights, significantly improving model size and inference speed. VA-VAE resolves optimization issues in Latent Diffusion models enabling 21x faster image generation.

AI Systems/Frameworks: OS-Genesis presents automated data generation pipelines for GUI agents. CodeElo proposes a new benchmark system for evaluating LLM code generation capability. Next Token Prediction presents a unified framework for multi-modality learning. Bringing Objects to Life develops 4D generation methods adding realistic motion to 3D models using NeRF and text-based image-to-video diffusion models.