BERT-base-cased fine-tuned on OntoNotes 5.0

This model is a fine-tuned version of google-bert/bert-base-cased on the English subset of the OntoNotes 5.0 (CoNLL-2012) dataset. It is designed for Named Entity Recognition (NER) and can identify 18 types of entities.

πŸ“Š Performance

The model achieves the following results on the OntoNotes 5.0 test set:

Entity Precision Recall F1-Score Support
CARDINAL 0.7776 0.8070 0.7920 1005
DATE 0.7943 0.8628 0.8272 1786
EVENT 0.5000 0.6235 0.5550 85
FAC 0.6081 0.6040 0.6061 149
GPE 0.9243 0.9156 0.9199 2546
LANGUAGE 0.7500 0.6818 0.7143 22
LAW 0.5200 0.5909 0.5532 44
LOC 0.6478 0.7442 0.6926 215
MONEY 0.8760 0.9155 0.8953 355
NORP 0.8956 0.9182 0.9067 990
ORDINAL 0.7252 0.7778 0.7506 207
ORG 0.8621 0.8991 0.8802 2002
PERCENT 0.8575 0.9017 0.8790 407
PERSON 0.9080 0.9161 0.9121 2134
PRODUCT 0.5918 0.6444 0.6170 90
QUANTITY 0.7042 0.6536 0.6780 153
TIME 0.5906 0.6667 0.6263 225
WORK_OF_ART 0.6022 0.6450 0.6229 169
micro avg 0.8413 0.8710 0.8559 12584
macro avg 0.7297 0.7649 0.7460 12584
weighted avg 0.8440 0.8710 0.8570 12584

πŸ›  Training Details

  • Architecture: BertForTokenClassification
  • Tokenizer: BertTokenizerFast (using is_split_into_words=True for alignment)
  • Epochs: 5
  • Learning Rate: 2e-5
  • Batch Size: 16 per device (Total 32 on 2x V100 GPUs)
  • Max Sequence Length: 128
  • Weight Decay: 0.01
  • Mixed Precision (FP16): Enabled

πŸ“‚ Labels Mapping

The model was trained with the following label mapping (18 OntoNotes entities + BIO tags): CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART.

πŸ“‚ Project Assets

  • GitHub Repository: https://github.com/Learnrr/ontonotes5_ner_evaluation.git
    Asset File Description
    Model Weights model.safetensors Main checkpoint in Safetensors format (safe, fast loading, ~431 MB).
    Configuration config.json Model architecture settings and id2label entity mapping.
    Vocabulary vocab.txt BERT-cased WordPiece vocabulary for tokenization.
    Tokenizer tokenizer.json / tokenizer_config.json Optimized fast tokenizer configuration and serialization.
    Special Tokens special_tokens_map.json Definitions for special tokens like [CLS], [SEP], etc.
    Training Args training_args.bin Detailed hyperparameter settings used during the training run.

πŸš€ Usage

You can use this model directly with a pipeline for token classification:

from transformers import pipeline

model_checkpoint = "learnrr/bert-base-ontonotes5-ner"
token_classifier = pipeline(
    "token-classification", 
    model=model_checkpoint, 
    aggregation_strategy="simple"
)

text = "Apple was founded by Steve Jobs in Cupertino."
results = token_classifier(text)

for entity in results:
    print(f"Entity: {entity['word']} | Label: {entity['entity_group']} | Score: {entity['score']:.4f}")
Downloads last month
32
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for learnrr/bert-base-ontonotes5-ner

Finetuned
(2875)
this model

Dataset used to train learnrr/bert-base-ontonotes5-ner