Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Paper • 2506.04633 • Published Jun 5 • 19
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning Paper • 2507.12508 • Published Jul 16 • 26
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 77
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 259
Taming generative video models for zero-shot optical flow extraction Paper • 2507.09082 • Published Jul 11 • 12
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs Paper • 2506.21656 • Published Jun 26 • 15
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published Dec 31, 2024 • 23
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Paper • 2505.15804 • Published May 21 • 10
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25 • 35
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 67
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Paper • 2502.13143 • Published Feb 18 • 31
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 29
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper • 2412.07825 • Published Dec 10, 2024 • 12
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published Jun 3 • 40