sentence-transformers/all-nli
Viewer • Updated • 2.86M • 2.57k • 50
How to use sobamchan/bert-large-uncased-no-mrl with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sobamchan/bert-large-uncased-no-mrl")
sentences = [
"A man is jumping unto his filthy bed.",
"A young male is looking at a newspaper while 2 females walks past him.",
"The bed is dirty.",
"The man is on the moon."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from google-bert/bert-large-uncased on the all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'A construction worker peeking out of a manhole while his coworker sits on the sidewalk smiling.',
'A worker is looking out of a manhole.',
'The workers are both inside the manhole.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8028, 0.6435],
# [0.8028, 1.0000, 0.7869],
# [0.6435, 0.7869, 1.0000]])
sts-devEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.7988 |
| spearman_cosine | 0.8165 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
A person on a horse jumps over a broken down airplane. |
A person is outdoors, on a horse. |
A person is at a diner, ordering an omelette. |
Children smiling and waving at camera |
There are children present |
The kids are frowning |
A boy is jumping on skateboard in the middle of a red bridge. |
The boy does a skateboarding trick. |
The boy skates down the sidewalk. |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768
],
"matryoshka_weights": [
1
],
"n_dims_per_step": -1
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
Two women are embracing while holding to go packages. |
Two woman are holding packages. |
The men are fighting outside a deli. |
Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. |
Two kids in numbered jerseys wash their hands. |
Two kids in jackets walk to school. |
A man selling donuts to a customer during a world exhibition event held in the city of Angeles |
A man selling donuts to a customer. |
A woman drinks her coffee in a small cafe. |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768
],
"matryoshka_weights": [
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 15warmup_ratio: 0.1overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 15max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | sts-dev_spearman_cosine |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.5941 |
| 0.0287 | 500 | 1.9263 | 0.7269 | 0.8006 |
| 0.0574 | 1000 | 0.8808 | 0.4899 | 0.8306 |
| 0.0860 | 1500 | 0.6811 | 0.3757 | 0.8432 |
| 0.1147 | 2000 | 0.5842 | 0.3250 | 0.8448 |
| 0.1434 | 2500 | 0.5269 | 0.3007 | 0.8472 |
| 0.1721 | 3000 | 0.4937 | 0.2855 | 0.8541 |
| 0.2008 | 3500 | 0.4717 | 0.2636 | 0.8510 |
| 0.2294 | 4000 | 0.4398 | 0.2596 | 0.8509 |
| 0.2581 | 4500 | 0.43 | 0.2507 | 0.8575 |
| 0.2868 | 5000 | 0.4094 | 0.2419 | 0.8566 |
| 0.3155 | 5500 | 0.3927 | 0.2349 | 0.8595 |
| 0.3442 | 6000 | 0.3904 | 0.2356 | 0.8568 |
| 0.3729 | 6500 | 0.3844 | 0.2275 | 0.8510 |
| 0.4015 | 7000 | 0.377 | 0.2220 | 0.8560 |
| 0.4302 | 7500 | 0.363 | 0.2235 | 0.8412 |
| 0.4589 | 8000 | 0.3616 | 0.2305 | 0.8531 |
| 0.4876 | 8500 | 0.3733 | 0.2306 | 0.8457 |
| 0.5163 | 9000 | 0.3675 | 0.2290 | 0.8460 |
| 0.5449 | 9500 | 0.358 | 0.2291 | 0.8459 |
| 0.5736 | 10000 | 0.3322 | 0.2218 | 0.8479 |
| 0.6023 | 10500 | 0.3376 | 0.2254 | 0.8339 |
| 0.6310 | 11000 | 0.3308 | 0.2140 | 0.8428 |
| 0.6597 | 11500 | 0.3475 | 0.2382 | 0.8339 |
| 0.6883 | 12000 | 0.3498 | 0.2172 | 0.8325 |
| 0.7170 | 12500 | 0.3266 | 0.2290 | 0.8479 |
| 0.7457 | 13000 | 0.3214 | 0.2297 | 0.8355 |
| 0.7744 | 13500 | 0.3237 | 0.2363 | 0.8325 |
| 0.8031 | 14000 | 0.3108 | 0.2334 | 0.8307 |
| 0.8318 | 14500 | 0.3143 | 0.3627 | 0.7954 |
| 0.8604 | 15000 | 0.3156 | 0.2238 | 0.8378 |
| 0.8891 | 15500 | 0.3204 | 0.2271 | 0.8390 |
| 0.9178 | 16000 | 0.314 | 0.2332 | 0.8349 |
| 0.9465 | 16500 | 0.3074 | 0.2277 | 0.8324 |
| 0.9752 | 17000 | 0.2937 | 0.2326 | 0.8274 |
| 1.0038 | 17500 | 0.2919 | 0.2350 | 0.8288 |
| 1.0325 | 18000 | 0.2483 | 0.2381 | 0.8367 |
| 1.0612 | 18500 | 0.2534 | 0.2397 | 0.8227 |
| 1.0899 | 19000 | 0.2699 | 0.2495 | 0.8221 |
| 1.1186 | 19500 | 0.2691 | 0.2468 | 0.8193 |
| 1.1472 | 20000 | 0.2843 | 0.2462 | 0.8346 |
| 1.1759 | 20500 | 0.2736 | 0.2387 | 0.8321 |
| 1.2046 | 21000 | 0.2728 | 0.2415 | 0.8364 |
| 1.2333 | 21500 | 0.2769 | 0.2483 | 0.8301 |
| 1.2620 | 22000 | 0.2633 | 0.2582 | 0.8340 |
| 1.2907 | 22500 | 0.2719 | 0.2484 | 0.8295 |
| 1.3193 | 23000 | 0.2787 | 0.2606 | 0.8297 |
| 1.3480 | 23500 | 0.2812 | 0.2595 | 0.8290 |
| 1.3767 | 24000 | 0.2868 | 0.2659 | 0.8208 |
| 1.4054 | 24500 | 0.2776 | 0.2520 | 0.8369 |
| 1.4341 | 25000 | 0.2772 | 0.2759 | 0.8307 |
| 1.4627 | 25500 | 0.2887 | 0.2735 | 0.8198 |
| 1.4914 | 26000 | 0.2892 | 0.2787 | 0.8367 |
| 1.5201 | 26500 | 0.2779 | 0.2612 | 0.8173 |
| 1.5488 | 27000 | 0.2791 | 0.2593 | 0.8230 |
| 1.5775 | 27500 | 0.2939 | 0.2678 | 0.8256 |
| 1.6061 | 28000 | 0.2808 | 0.2729 | 0.8241 |
| 1.6348 | 28500 | 0.2913 | 0.2700 | 0.8163 |
| 1.6635 | 29000 | 0.2919 | 0.2855 | 0.8315 |
| 1.6922 | 29500 | 0.284 | 0.2684 | 0.8338 |
| 1.7209 | 30000 | 0.2867 | 0.2703 | 0.8254 |
| 1.7496 | 30500 | 0.2781 | 0.2738 | 0.8186 |
| 1.7782 | 31000 | 0.2806 | 0.2621 | 0.8170 |
| 1.8069 | 31500 | 0.2859 | 0.2727 | 0.8197 |
| 1.8356 | 32000 | 0.2732 | 0.2716 | 0.8238 |
| 1.8643 | 32500 | 0.2797 | 0.2728 | 0.8178 |
| 1.8930 | 33000 | 0.2701 | 0.2715 | 0.8219 |
| 1.9216 | 33500 | 0.265 | 0.2638 | 0.8250 |
| 1.9503 | 34000 | 0.275 | 0.2660 | 0.8188 |
| 1.9790 | 34500 | 0.2684 | 0.2765 | 0.8112 |
| 2.0077 | 35000 | 0.2607 | 0.2648 | 0.8151 |
| 2.0364 | 35500 | 0.197 | 0.2673 | 0.8123 |
| 2.0650 | 36000 | 0.2075 | 0.2706 | 0.8129 |
| 2.0937 | 36500 | 0.2111 | 0.2647 | 0.8263 |
| 2.1224 | 37000 | 0.2202 | 0.2736 | 0.8133 |
| 2.1511 | 37500 | 0.2135 | 0.2640 | 0.8118 |
| 2.1798 | 38000 | 0.2229 | 0.2667 | 0.8166 |
| 2.2085 | 38500 | 0.209 | 0.2622 | 0.8090 |
| 2.2371 | 39000 | 0.2039 | 0.2639 | 0.8104 |
| 2.2658 | 39500 | 0.2113 | 0.2827 | 0.8235 |
| 2.2945 | 40000 | 0.2065 | 0.2698 | 0.8151 |
| 2.3232 | 40500 | 0.21 | 0.2593 | 0.8155 |
| 2.3519 | 41000 | 0.2083 | 0.2733 | 0.7975 |
| 2.3805 | 41500 | 0.231 | 0.2822 | 0.8088 |
| 2.4092 | 42000 | 0.2109 | 0.2667 | 0.8180 |
| 2.4379 | 42500 | 0.2006 | 0.2791 | 0.8071 |
| 2.4666 | 43000 | 0.2131 | 0.2747 | 0.8230 |
| 2.4953 | 43500 | 0.2101 | 0.2674 | 0.8165 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
google-bert/bert-large-uncased