openai/whisper-large-v3-turbo Automatic Speech Recognition • Updated Oct 4, 2024 • 4.61M • • 2.86k
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published Nov 21, 2025 • 28
Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Paper • 2511.16175 • Published Nov 20, 2025 • 12
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published 9 days ago • 81