What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
Paper โข 2605.07915 โข Published โข 8
Computer Vision
RIVER: A Real-Time Interaction Benchmark for Video LLMs
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision