OpenClaw-RL: Train Any Agent Simply by Talking Paper • 2603.10165 • Published about 1 month ago • 150
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published 28 days ago • 91
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published Jan 27 • 25
Generative Neural Video Compression via Video Diffusion Prior Paper • 2512.05016 • Published Dec 4, 2025 • 10
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards Paper • 2512.00425 • Published Nov 29, 2025 • 53
First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published Nov 19, 2025 • 54
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published Nov 20, 2025 • 24
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published Nov 21, 2025 • 28
GigaWorld-0: World Models as Data Engine to Empower Embodied AI Paper • 2511.19861 • Published Nov 25, 2025 • 30
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published Nov 25, 2025 • 50
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13, 2025 • 101
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models Paper • 2511.10629 • Published Nov 13, 2025 • 129
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation Paper • 2511.09057 • Published Nov 12, 2025 • 82
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation Paper • 2510.14902 • Published Oct 16, 2025 • 17