Update (March 2026): We are excited to introduce LLaDA-o, the latest model in the LLaDA series. As an effective and length-adaptive omni diffusion model for unified multimodal understanding and generation, LLaDA-o extends the LLaDA line to broader multimodal settings, supporting visual understanding, text-to-image generation, and instruction-based image editing. For more details, please check out the paper and code.

LLaDA-V

We introduce LLaDA-V, a competitive diffusion-based vision-language model that outperforms other diffusion MLLMs.

It was presented in the paper LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning.

Project Page: https://ml-gsai.github.io/LLaDA-V-demo/

Code: https://github.com/ML-GSAI/LLaDA-V

Downloads last month
3,488
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for GSAI-ML/LLaDA-V