MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper
• 2511.18373
• Published
• 7
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper
• 2511.13288
• Published
• 19
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper
• 2511.19418
• Published
• 29
SAM 3: Segment Anything with Concepts
Paper
• 2511.16719
• Published
• 131
Temporal Prompting Matters: Rethinking Referring Video Object
Segmentation
Paper
• 2510.07319
• Published
• 3
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published
• 93
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Paper
• 2511.13593
• Published
• 28
RynnVLA-002: A Unified Vision-Language-Action and World Model
Paper
• 2511.17502
• Published
• 28
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
Paper
• 2511.11007
• Published
• 15
Depth Anything 3: Recovering the Visual Space from Any Views
Paper
• 2511.10647
• Published
• 99
LightRAG: Simple and Fast Retrieval-Augmented Generation
Paper
• 2410.05779
• Published
• 31
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
• 2510.14528
• Published
• 120
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper
• 2412.20138
• Published
• 20
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper
• 2410.17799
• Published
• 11
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
Paper
• 2511.13648
• Published
• 52
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
• 2509.22186
• Published
• 149
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite
Imagery
Paper
• 2510.15869
• Published
• 50
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
Paper
• 2511.15705
• Published
• 97
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable
Reasoning
Paper
• 2510.22543
• Published
• 14
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
Paper
• 2511.19900
• Published
• 48
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
Paper
• 2512.10867
• Published
• 16
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Paper
• 2503.04721
• Published
• 3