Sentence Similarity
sentence-transformers
Safetensors
Transformers
English
Chinese
qwen2
feature-extraction
mteb
custom_code
Eval Results (legacy)
text-embeddings-inference
Instructions to use infly/inf-retriever-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use infly/inf-retriever-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("infly/inf-retriever-v1", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use infly/inf-retriever-v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("infly/inf-retriever-v1", trust_remote_code=True) model = AutoModel.from_pretrained("infly/inf-retriever-v1", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
flash attention not working with model
#1
by XVII - opened
If you try to use sentence transformers with flash_attention_2 you get error NameError: name '_flash_supports_window_size' is not defined
If you uncomment lines 49-53 in modeling_qwen.py everything woks fine.
Code to reproduce:
from sentence_transformers import SentenceTransformer
import torch
class InfRetrieverV1Embedder:
def __init__(self):
self.model = SentenceTransformer(
"infly/inf-retriever-v1",
trust_remote_code=True,
device='cuda',
model_kwargs = {
'attn_implementation': 'flash_attention_2',
"torch_dtype": torch.bfloat16
}
)
self.embedding_dims = 3584
self.max_length = 4096
self.batch_size = 8
self.model_name = "inf-retriever-v1"
self.model.max_seq_length = self.max_length
def encode(self, texts, mode='document'):
assert mode in ('query', 'document')
if mode=='document':
res = self.model.encode(texts, batch_size=self.batch_size)
else:
res = self.model.encode(
texts,
prompt="You are given code snippet with incomplete line. Retrieve relevant code snippets that help to complete this line.",
batch_size=self.batch_size
)
return res.tolist()
embedder = InfRetrieverV1Embedder()
load = ['def hello_world'*10000] * 256
embedder.encode(load)
Transformers 4.49.0 with flash attention 2.7.1post1 and 3.4.1 sentence-transformers
Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?
Is it safe to modify this code, or you have faced some hidden consequencef of using flash attention?
We commented out lines 49-53 just for convenience, to remove the dependency on flash_attn. You can safely uncomment those lines.