Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24, 2025 • 29
CoLLM: A Large Language Model for Composed Image Retrieval Paper • 2503.19910 • Published Mar 25, 2025 • 15
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3, 2025 • 39