GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published 3 days ago • 58
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 1 day ago • 28
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published 1 day ago • 33
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 5 days ago • 43
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 4 days ago • 116
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published 2 days ago • 50
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization Paper • 2603.28342 • Published 2 days ago • 19
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published 6 days ago • 145
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models Paper • 2603.25502 • Published 6 days ago • 55
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 9 days ago • 119