Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each.
-
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 3 -
jinaai/jina-embeddings-v5-omni-small
Feature Extraction • 2B • Updated • 16.2k • 23 -
jinaai/jina-embeddings-v5-omni-nano
Feature Extraction • 1.0B • Updated • 14.8k • 10 -
jinaai/jina-embeddings-v5-omni-nano-text-matching
Feature Extraction • 0.9B • Updated • 11.6k • 3
