SetFit with JohanHeinsen/Old_News_Segmentation_SBERT_V0.1

This is a SetFit model used to classify gender in labour advertisements from the eigtheenth and nineteenth centuries. It was trained by Sofus Landor Dam and Johan Heinsen.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
1
  • 'En skikkelig Pige søger Condition strar eller til St. Hansdag som Opvartningspige, i Mangel deraf som Stue= eller Enevighvor Konen gaaer i Huusholdningen, anvises paa Hiørnet af Larsbjørnstræde og Volden 236 i Stuen.'
  • 'En Pige fra Landet søger strax Condition for Amme eller Goldamme, er at finde paa Vesterbro Nr. 9.'
  • 'En Kone, der godt kan vaske, stryge og tillige godt lave Mad, ønsker sig Condition hos en honet Familie som Kokke eller Enepige, eller og at gaae i ugeviis, hun kan tillige i malke, om forlanges, anvises i Bredgaden Nr. 202 paa 5 første Sal.'
0
  • 'En skikkelig Karl fra Jylland søger Condition til St. Hansdag og er at finde paa Christianshavn paa Hiørnet af Dronningensgade og Torvegagen i Kielderen i Nr. 359.'
  • 'En svensk Karl, nyelig kommen her til Staden, ønsker sig Condition som Kudsk eller Tienercher i Byen, eller paa Landet, har sine behørige Skudsmaal, er at finde i store Kongensgade No. 51.'
  • 'En Student, der er øvet i at informere, tilbyder sig at give Underviisning i det tydske Sprog, Regning, Skrivning, Religion, samt andre til Akademiet hørende Videnskaber Anviisningen gives i Adelgaden Nr. 206, første Sal, det første Huus paa høire Haand fra Gottersgaden.'

Evaluation

Metrics

Label Accuracy F1 Precision Recall
all 0.9924 0.9944 0.9944 0.9944

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("JohanHeinsen/Labour_ads_gender")
# Run inference
preds = model("En Stuepige, som forstaaer hvad hun bør, søger til Paaske; er at finde i Dronningens Tvergade Nr. 363 i Stuen.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 8 32.4388 176
Label Training Sample Count
0 194
1 419

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (3, 3)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 12
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0011 1 0.2907 -
0.0543 50 0.2618 -
0.1087 100 0.0493 -
0.1630 150 0.0181 -
0.2174 200 0.0038 -
0.2717 250 0.001 -
0.3261 300 0.0005 -
0.3804 350 0.0003 -
0.4348 400 0.0002 -
0.4891 450 0.0001 -
0.5435 500 0.0001 -
0.5978 550 0.0001 -
0.6522 600 0.0001 -
0.7065 650 0.0001 -
0.7609 700 0.0001 -
0.8152 750 0.0001 -
0.8696 800 0.0001 -
0.9239 850 0.0 -
0.9783 900 0.0 -
1.0326 950 0.0 -
1.0870 1000 0.0 -
1.1413 1050 0.0 -
1.1957 1100 0.0 -
1.25 1150 0.0 -
1.3043 1200 0.0 -
1.3587 1250 0.0 -
1.4130 1300 0.0 -
1.4674 1350 0.0 -
1.5217 1400 0.0 -
1.5761 1450 0.0 -
1.6304 1500 0.0 -
1.6848 1550 0.0 -
1.7391 1600 0.0 -
1.7935 1650 0.0 -
1.8478 1700 0.0 -
1.9022 1750 0.0 -
1.9565 1800 0.0 -
2.0109 1850 0.0 -
2.0652 1900 0.0 -
2.1196 1950 0.0 -
2.1739 2000 0.0 -
2.2283 2050 0.0 -
2.2826 2100 0.0 -
2.3370 2150 0.0 -
2.3913 2200 0.0 -
2.4457 2250 0.0 -
2.5 2300 0.0 -
2.5543 2350 0.0 -
2.6087 2400 0.0 -
2.6630 2450 0.0 -
2.7174 2500 0.0 -
2.7717 2550 0.0 -
2.8261 2600 0.0 -
2.8804 2650 0.0 -
2.9348 2700 0.0 -
2.9891 2750 0.0 -

Framework Versions

  • Python: 3.11.12
  • SetFit: 1.1.3
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.7.0
  • Datasets: 2.19.2
  • Tokenizers: 0.21.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JohanHeinsen/Labour_ads_gender

Dataset used to train JohanHeinsen/Labour_ads_gender

Evaluation results