From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models Paper • 2512.10867 • Published 23 days ago • 15
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Paper • 2409.18111 • Published Sep 26, 2024 • 7
mR^2AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA Paper • 2411.15041 • Published Nov 22, 2024 • 1
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning Paper • 2509.18094 • Published Sep 22, 2025 • 4
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Paper • 2505.15804 • Published May 21, 2025 • 10
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Paper • 2503.13444 • Published Mar 17, 2025 • 17
Image Conductor: Precision Control for Interactive Video Synthesis Paper • 2406.15339 • Published Jun 21, 2024 • 9