Representing Speech Through Autoregressive Prediction of Cochlear Tokens Paper • 2508.11598 • Published Aug 15 • 17
Taming generative video models for zero-shot optical flow extraction Paper • 2507.09082 • Published Jul 11 • 12
Understanding Physical Dynamics with Counterfactual World Modeling Paper • 2312.06721 • Published Dec 11, 2023
3D Scene Understanding Through Local Random Access Sequence Modeling Paper • 2504.03875 • Published Apr 4 • 5
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words Paper • 2312.02931 • Published Dec 5, 2023 • 9