Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published 7 days ago • 35
Rethinking Training Dynamics in Scale-wise Autoregressive Generation Paper • 2512.06421 • Published 20 days ago • 5
Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models Paper • 2509.25050 • Published Sep 29 • 4
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7 • 24
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation Paper • 2304.09801 • Published Apr 19, 2023
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations Paper • 2202.07800 • Published Feb 16, 2022
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training Paper • 2309.13942 • Published Sep 25, 2023 • 1
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7, 2024 • 40