Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning Paper • 2512.19687 • Published 3 days ago • 1
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement Paper • 2512.13303 • Published 10 days ago • 16
Emu3.5 Collection Native Multimodal Models are World Learners 🌍 • 4 items • Updated about 23 hours ago • 72
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance Paper • 2510.24711 • Published Oct 28 • 19
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29 • 44
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 87
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published Mar 14 • 21