File size: 11,674 Bytes
6614f0d
 
 
 
704583a
 
 
011dd70
704583a
 
 
69078d3
6614f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ef5dfe
6614f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01943a4
 
 
 
 
 
 
 
 
 
6614f0d
01943a4
 
 
6614f0d
01943a4
c88f220
 
 
 
 
 
 
 
 
 
 
 
 
 
01943a4
c88f220
 
69078d3
c88f220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6614f0d
 
 
 
6ef5dfe
 
 
 
 
6614f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ef5dfe
 
6614f0d
 
 
 
 
6ef5dfe
01943a4
 
 
 
 
6614f0d
 
01943a4
 
 
 
 
 
 
6614f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ef5dfe
01943a4
6614f0d
01943a4
 
 
 
 
 
 
 
 
 
6614f0d
01943a4
6614f0d
0427f3a
6614f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c342f94
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---

license: apache-2.0  
base_model: microsoft/MiniLM-L6-v2  
tags:  
- transformers
- sentence-transformers
- sentence-similarity
- feature-extraction
- text-embeddings-inference
- information-retrieval
- knowledge-distillation
- transformers.js
language:
- en
---

<div style="display: flex; justify-content: center;">      
    <div style="display: flex; align-items: center; gap: 10px;">      

        <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">      

        <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-mt</span>      

    </div>      

</div>  


# Content

1. [Introduction](#introduction)
2. [Technical Report](#technical-report)
3. [Highlights](#highlights)
4. [Benchmarks](#benchmark-comparison)
5. [Quickstart](#quickstart)
6. [Citation](#citation)

# Introduction

`mdbr-leaf-mt` is a compact high-performance text embedding model designed for classification, clustering, semantic sentence similarity and summarization tasks. 

To enable even greater efficiency, `mdbr-leaf-mt` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).

If you are looking to perform semantic search / information retrieval (e.g. for RAGs), please check out our [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir) model, which is specifically trained for these tasks.

> [!Note]  
> **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.

# Technical Report

A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).

# Highlights  

* **State-of-the-Art Performance**: `mdbr-leaf-mt` achieves new state-of-the-art results for compact embedding models, **ranking #1** on the [public MTEB v2 (Eng) benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤30M parameters.
* **Flexible Architecture Support**: `mdbr-leaf-mt` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-mt` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.

## Benchmark Comparison
  
The table below shows the scores for `mdbr-leaf-mt` on the MTEB v2 (English) benchmark, compared to other retrieval models.

`mdbr-leaf-mt` ranks #1 on this benchmark for models with <30M parameters.

| Model                              | Size    | MTEB v2 (Eng) |  
|------------------------------------|---------|---------------|  
| OpenAI text-embedding-3-large      | Unknown | 66.43         |  
| OpenAI text-embedding-3-small      | Unknown | 64.56         |  
| **mdbr-leaf-mt**                   | 23M     | **63.97**     |  
| gte-small                          | 33M     | 63.22         |  
| snowflake-arctic-embed-s           | 32M     | 61.59         |  
| e5-small-v2                        | 33M     | 61.32         |  
| granite-embedding-small-english-r2 | 47M     | 61.07         |  
| all-MiniLM-L6-v2                   | 22M     | 59.03         |  


# Quickstart  
  
## Sentence Transformers  
  
```python  

from sentence_transformers import SentenceTransformer  

  

# Load the model  

model = SentenceTransformer("MongoDB/mdbr-leaf-mt")  

  

# Example queries and documents  

queries = [

    "What is machine learning?",  

    "How does neural network training work?"  

]  

  

documents = [  

    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",  

    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."  

]  

  

# Encode queries and documents  

query_embeddings = model.encode(queries, prompt_name="query")  

document_embeddings = model.encode(documents)  

  

# Compute similarity scores  

scores = model.similarity(query_embeddings, document_embeddings)  



# Print results

for i, query in enumerate(queries):

    print(f"Query: {query}")

    for j, doc in enumerate(documents):

        print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")

```

<details>

<summary>See example output</summary>

```

Query: What is machine learning?

 Similarity: 0.9063 | Document 0: Machine learning is a subset of ...

 Similarity: 0.7287 | Document 1: Neural networks are trained ...



Query: How does neural network training work?

 Similarity: 0.6725 | Document 0: Machine learning is a subset of ...

 Similarity: 0.8287 | Document 1: Neural networks are trained ...

```
</details>

## Transformers.js

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
```bash

npm i @huggingface/transformers

```

You can then use the model to compute embeddings like this:

```js

import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";



// Download from the 🤗 Hub

const model_id = "MongoDB/mdbr-leaf-mt";

const tokenizer = await AutoTokenizer.from_pretrained(model_id);

const model = await AutoModel.from_pretrained(model_id, {

    dtype: "fp32", // Options: "fp32" | "fp16" | "q8" | "q4" | "q4f16"

});



// Prepare queries and documents

const queries = [

    "What is machine learning?",

    "How does neural network training work?",

];

const documents = [  

    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",

    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",

];

const inputs = await tokenizer([

    ...queries.map((x) => "Represent this sentence for searching relevant passages: " + x),

    ...documents,

], { padding: true });



// Generate embeddings

const { sentence_embedding } = await model(inputs);

const normalized_sentence_embedding = sentence_embedding.normalize();



// Compute similarities

const scores = await matmul(

    normalized_sentence_embedding.slice([0, queries.length]),

    normalized_sentence_embedding.slice([queries.length, null]).transpose(1, 0),

);

const scores_list = scores.tolist();



for (let i = 0; i < queries.length; ++i) {

    console.log(`Query: ${queries[i]}`);

    for (let j = 0; j < documents.length; ++j) {

        console.log(` Similarity: ${scores_list[i][j].toFixed(4)} | Document ${j}: ${documents[j]}`);

    }

    console.log();

}

```

<details>

<summary>See example output</summary>

```

Query: What is machine learning?

 Similarity: 0.9063 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.

 Similarity: 0.7287 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.



Query: How does neural network training work?

 Similarity: 0.6725 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.

 Similarity: 0.8287 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.

```
</details>


## Transformers Usage  

See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).
  
## Asymmetric Retrieval Setup

> [!Note]
> **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-mt-asym).

`mdbr-leaf-mt` is *aligned* to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from, making the asymmetric system below possible: 
  
```python  

# Use mdbr-leaf-mt for query encoding (real-time, low latency)  

query_model = SentenceTransformer("MongoDB/mdbr-leaf-mt")  

query_embeddings = query_model.encode(queries, prompt_name="query")  



# Use a larger model for document encoding (one-time, at index time)  

doc_model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")  

document_embeddings = doc_model.encode(documents)  

  

# Compute similarities  

scores = query_model.similarity(query_embeddings, document_embeddings)  

```
Retrieval results from asymmetric mode are usually superior to the [standard mode above](#sentence-transformers).

## MRL Truncation

Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
```python

query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)

doc_embeds = model.encode(documents, truncate_dim=256)



similarities = model.similarity(query_embeds, doc_embeds)



print('After MRL:')

print(f"* Embeddings dimension: {query_embeds.shape[1]}")

print(f"* Similarities: \n\t{similarities}")

```

<details>

<summary>See example output</summary>

```

After MRL:

* Embeddings dimension: 256

* Similarities:

    tensor([[0.9164, 0.7219],

            [0.6682, 0.8393]], device='cuda:0')

```
</details>

## Vector Quantization
Vector quantization, for example to `int8` or `binary`, can be performed as follows:

**Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization). 
Good initial values are -1.0 and +1.0.
```python

from sentence_transformers.quantization import quantize_embeddings

import torch



query_embeds = model.encode(queries, prompt_name="query")

doc_embeds = model.encode(documents)



# Quantize embeddings to int8 using -1.0 and +1.0

ranges = torch.tensor([[-1.0], [+1.0]]).expand(2, query_embeds.shape[1]).cpu().numpy()

query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)

doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)



# Calculate similarities; cast to int64 to avoid under/overflow

similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T



print('After quantization:')

print(f"* Embeddings type: {query_embeds.dtype}")

print(f"* Similarities: \n{similarities}")

```

<details>

<summary>See example output</summary>

```

After quantization:

* Embeddings type: int8

* Similarities:

   [[2202032 1422868]

    [1421197 1845580]]

```
</details>

## Evaluation

Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/evaluate_models.ipynb).

# Citation  
  
If you use this model in your work, please cite:  
  
```bibtex  

@misc{mdbr_leaf,

      title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations}, 

      author={Robin Vujanic and Thomas Rueckstiess},

      year={2025},

      eprint={2509.12539},

      archivePrefix={arXiv},

      primaryClass={cs.IR},

      url={https://arxiv.org/abs/2509.12539}, 

}

```  
  
# License  
  
This model is released under Apache 2.0 License.  
  
# Contact  
  
For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML Research team at [email protected].