GenLIP-L16-NaViT / README.md
YanFang's picture
Upload folder using huggingface_hub
97fdb99 verified
metadata
license: mit

This repository serves as the official model zoo for Let ViT Speak: Generative Language-Image Pre-training.

Currently released models

  1. Mdels from fixed low resolution pretraining:
  • GenLIP-L16-224
  • GenLIP-So16-224
  • GenLIP-g16-224
  1. NaViT models:
  • GenLIP-L16-NaViT
  • GenLIP-So16-NaViT
  • GenLIP-g16-NaViT

We use siglip image preprocessor for our fixed low resolution models (*-224), and use a Qwen2-VL style image preprocessor for our NaViT models (*-NaViT).

Pretraining and implementation details can be found in our codebase [GenLIP].