Submitted by akhaliq 83 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs · 8 authors 3
Submitted by akhaliq 33 MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators · 9 authors 1.34k 2
Submitted by akhaliq 26 SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing · 10 authors
Submitted by akhaliq 25 ByteEdit: Boost, Comply and Accelerate Generative Image Editing · 14 authors 1
Submitted by akhaliq 23 MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding · 8 authors 346 1
Submitted by akhaliq 23 BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion · 5 authors
Submitted by akhaliq 18 PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations · 11 authors
Submitted by akhaliq 15 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation · 6 authors 234 2
Submitted by akhaliq 13 Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models · 5 authors 143