Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper β’ 2511.20561 β’ Published Nov 25, 2025 β’ 32
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper β’ 2511.19418 β’ Published Nov 24, 2025 β’ 28
CoVT: Chain-of-Visual-Thought Collection Enrich VLMsβ vision-centric reasoning capabilities via Chain-of-Visual-Thought! β’ 7 items β’ Updated Nov 25, 2025 β’ 6
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper β’ 2510.12784 β’ Published Oct 14, 2025 β’ 19
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning Paper β’ 2510.11026 β’ Published Oct 13, 2025 β’ 17
Reconstruction Alignment Improves Unified Multimodal Models Paper β’ 2509.07295 β’ Published Sep 8, 2025 β’ 40
view article Article Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Jul 10, 2025 β’ 53
RecA Collection Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! β’ 8 items β’ Updated Sep 22, 2025 β’ 14
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Paper β’ 2504.20690 β’ Published Apr 29, 2025 β’ 19
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models Paper β’ 2503.12885 β’ Published Mar 17, 2025 β’ 43
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper β’ 2411.07199 β’ Published Nov 11, 2024 β’ 50
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper β’ 2501.09756 β’ Published Jan 16, 2025 β’ 20
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation Paper β’ 2410.12669 β’ Published Oct 16, 2024 β’ 1
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper β’ 2501.05131 β’ Published Jan 9, 2025 β’ 37