Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("RishuD7/bge-base-en-v1.5-82-keys-phase-7-exp_v1")
# Run inference
sentences = [
"(d) Names. Service Provider will not use the name of Company, any Affiliate of Company, any Company employee or any employee of any Affiliate of Company, or any product or service of Company or any of its Affiliates in any press release, advertising or materials distributed to prospective or existing customers, annual reports or any other public disclosure, except with Company's prior written authorization. Under no circumstances will Service Provider use the logos or other trademarks of Company or any of its Affiliates in any such materials or disclosures.\nService Provider Personnel shall, comply with any written instructions issued by Company with respect to.. the use, storage and handling of the Company Materials. Service Provider will use best efforts to protect the Company Materials from any loss of or damage while such Company Materials are under Service Provider's control, which control will be deemed to begin upon receipt of the Company Materials by. Service Provider; provided that Service Provider shall not be liable for any loss or damage to Company. Materials to the extent such loss or damage is caused by Service Provider's compliance with such written. instructions.",
'Publicity',
'CBRE_Termination Trigger - Client',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.0063 |
| cosine_accuracy@3 | 0.0235 |
| cosine_accuracy@5 | 0.0353 |
| cosine_accuracy@10 | 0.0761 |
| cosine_precision@1 | 0.0063 |
| cosine_precision@3 | 0.0078 |
| cosine_precision@5 | 0.0071 |
| cosine_precision@10 | 0.0076 |
| cosine_recall@1 | 0.0063 |
| cosine_recall@3 | 0.0235 |
| cosine_recall@5 | 0.0353 |
| cosine_recall@10 | 0.0761 |
| cosine_ndcg@10 | 0.0343 |
| cosine_mrr@10 | 0.0219 |
| cosine_map@100 | 0.0359 |
positive and anchor| positive | anchor | |
|---|---|---|
| type | string | string |
| details |
|
|
| positive | anchor |
|---|---|
In the event that the Contractor provides the Service during the incomplete period of the the service was provided 3. The Customer shall reimburse the Contractor for the expenses incurred for the purchase of spare parts, equipment and materials for the purpose of providing the Services increased by the Contractor's mark-up, the amount of which is specified in Appendix No 4 "Terms and Conditions". The purchase of spare parts, equipment and materials referred to above will take place after the Contractor's application has been accepted by the Customer. 1 Settlement for the undertaking by the Contractor Emergency interventions will take place in accordance with and the conditions indicated in Clause 4 "Terms and Conditions" 5. For the performance of additional works, the Contractor will receive the remuneration specified in the application or contract for the performance of additional works accepted by the Client. |
CBRE_Pricing Criteria |
4.1 The Contractor, despite a written warning issued by the Contractor by registered mail, violates the provisions of the Agreement and #$#cease the infringement within 14 (fourteen) days from the date of receipt of the summons from the Contractor, unless, . due to the nature of the breach. its removal. requires a longer period and the actions to remedy the breach are taken immediately. and duly by the Contractor:. 4.4 The Contractor shall cease to perform the duties resulting from the contract in part or in part for more than 3 days. 5. The Contractor may terminate the Contract with effect from the date of written service - under pain of non-wai:noscj - a statement of termination. if:. 5.1The Customer shall not comply with the obligation to submit the seals after the deadline for the payment of the two consecutive settlement periods specified on the invoice and after the deadline of fourteen days specified by the Contractor in the. written reminder; out business activity |
CBRE_Termination Trigger - Client |
Works commissioned to the Contractor, which do not fall within the scope of the contract, are additionally valued as Additional Works after prior acceptance of the Contractor's offer within 30 days from the date of delivery of the duly issued invoice issued after the. completion of these works. 7. The amount of remuneration due as set out in Schedule No 4 "Terms and Conditions shall be the net amount and shall beand VAT and VAT.. 8. Any discounts, commissions and other bonuses that the Contractor receivesin connection with its global purchasing program will be retained by the Contractor and I wil not be subiect to settlement with the Principal. 9. In the event of changes in the Iaw resulting in an increase in costs related to the provision of Services on the part of the Contractor, the Customer undertakes to cover the above- mentioned costs, documented by the Contractor. |
CBRE_WCP Status Criteria |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: epochper_device_train_batch_size: 32per_device_eval_batch_size: 16gradient_accumulation_steps: 16learning_rate: 2e-05num_train_epochs: 30lr_scheduler_type: cosinewarmup_ratio: 0.1tf32: Falseload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 16eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 30max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Falselocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 |
|---|---|---|---|
| 0.6375 | 10 | 2.4919 | - |
| 1.2749 | 20 | 1.576 | - |
| 1.7211 | 27 | - | 0.0285 |
| 1.1713 | 30 | 0.6111 | - |
| 1.8088 | 40 | 1.622 | - |
| 2.4462 | 50 | 0.4089 | - |
| 2.7012 | 54 | - | 0.0300 |
| 2.3426 | 60 | 0.7251 | - |
| 2.9801 | 70 | 0.864 | - |
| 3.6175 | 80 | 0.152 | - |
| 3.6813 | 81 | - | 0.0299 |
| 3.5139 | 90 | 0.7404 | - |
| 4.1514 | 100 | 0.5908 | - |
| 4.7251 | 109 | - | 0.0304 |
| 4.0478 | 110 | 0.1358 | - |
| 4.6853 | 120 | 0.7636 | - |
| 5.3227 | 130 | 0.3625 | - |
| 5.7052 | 136 | - | 0.0332 |
| 5.2191 | 140 | 0.2812 | - |
| 5.8566 | 150 | 0.6369 | - |
| 6.4940 | 160 | 0.1818 | - |
| 6.6853 | 163 | - | 0.0327 |
| 6.3904 | 170 | 0.3748 | - |
| 7.0279 | 180 | 0.5476 | - |
| 7.6653 | 190 | 0.0952 | - |
| 7.7291 | 191 | - | 0.0334 |
| 7.5618 | 200 | 0.5157 | - |
| 8.1992 | 210 | 0.4383 | - |
| 8.7092 | 218 | - | 0.0362 |
| 8.0956 | 220 | 0.1392 | - |
| 8.7331 | 230 | 0.5627 | - |
| 9.3705 | 240 | 0.2617 | - |
| 9.6892 | 245 | - | 0.0336 |
| 9.2669 | 250 | 0.2135 | - |
| 9.9044 | 260 | 0.5106 | - |
| 10.5418 | 270 | 0.1462 | - |
| 10.7331 | 273 | - | 0.0343 |
| 10.4382 | 280 | 0.2909 | - |
| 11.0757 | 290 | 0.4675 | - |
| 11.7131 | 300 | 0.075 | 0.0348 |
| 11.6096 | 310 | 0.4271 | - |
| 12.2470 | 320 | 0.3571 | - |
| 12.6932 | 327 | - | 0.0358 |
| 12.1434 | 330 | 0.1183 | - |
| 12.7809 | 340 | 0.4438 | - |
| 13.4183 | 350 | 0.1956 | - |
| 13.7371 | 355 | - | 0.0352 |
| 13.3147 | 360 | 0.1887 | - |
| 13.9522 | 370 | 0.4342 | - |
| 14.5896 | 380 | 0.1177 | - |
| 14.7171 | 382 | - | 0.0346 |
| 14.4861 | 390 | 0.2633 | - |
| 15.1235 | 400 | 0.4205 | - |
| 15.6972 | 409 | - | 0.0340 |
| 15.0199 | 410 | 0.0649 | - |
| 15.6574 | 420 | 0.4102 | - |
| 16.2948 | 430 | 0.3021 | - |
| 16.7410 | 437 | - | 0.0343 |
| 16.1912 | 440 | 0.1288 | - |
| 16.8287 | 450 | 0.4247 | 0.0343 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-base-en-v1.5