metadata
license: mit
This repository serves as the official model zoo for Let ViT Speak: Generative Language-Image Pre-training.
Currently released models
- Mdels from fixed low resolution pretraining:
- GenLIP-L16-224
- GenLIP-So16-224
- GenLIP-g16-224
- NaViT models:
- GenLIP-L16-NaViT
- GenLIP-So16-NaViT
- GenLIP-g16-NaViT
We use siglip image preprocessor for our fixed low resolution models (*-224), and use a Qwen2-VL style image preprocessor for our NaViT models (*-NaViT).
Pretraining and implementation details can be found in our codebase [GenLIP].