🩺 vioBERT-v3 — The First Arabic Medical BERT

MARBERTv2 + ~1.12M Arabic medical documents → −82.7% medical PPL, +0.93 pp NER F1. Larger gain than BioBERT achieved on English (+0.62 pp).

Live demo License Org

vioBERT-v3 is the first Arabic domain-adapted BERT model purpose-built for the medical domain. It serves the 422 million Arabic speakers who have, until now, had no strong open biomedical language model in their language.

It is produced by continuing masked language model pre-training on a mix of publicly available Arabic medical corpora (~1.12M documents total) — listed below with full credit to the original authors.

🚀 Try it before you read the rest

👉 Live demo: vioBERT-v3 vs MARBERTv2 side-by-side

Type any Arabic medical sentence with [MASK], see top-5 predictions from both models. The gap is most visible on clinical content.

Key Results

Task vioBERT-v3 vs MARBERTv2
Medical perplexity −82.7% reduction
Fill-mask Top-5 accuracy +15.6 pp
Medical NER F1 +0.93 pp (exceeds BioBERT's +0.62 pp on English)
39-class classification F1 +1.62 pp
5-class classification F1 +0.97 pp

Why the BioBERT comparison matters

Lee et al. 2020 added biomedical pretraining to BERT and got +0.62 pp F1 on English biomedical NER. We added Arabic medical pretraining to MARBERTv2 and got +0.93 pp on Arabic medical NER.

Hypothesis: domain-adaptive pretraining (DAPT) pays more in lower-resource languages — base pretraining is thinner, leaving more headroom for domain knowledge to land. If true, the implication is that every Arabic-domain practitioner sitting on a corpus should be doing DAPT instead of waiting for someone to train a domain model from scratch.

Usage

from transformers import pipeline

fill_mask = pipeline("fill-mask", model="Vionex-digital/vioBERT-v3")
results = fill_mask("المريض يعاني من [MASK] في الصدر")
for r in results:
    print(f"{r['token_str']}: {r['score']:.4f}")

For fine-tuning on your downstream task:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tok = AutoTokenizer.from_pretrained("Vionex-digital/vioBERT-v3")
model = AutoModelForSequenceClassification.from_pretrained(
    "Vionex-digital/vioBERT-v3",
    num_labels=N,  # your number of classes
)
# ... standard HF fine-tuning loop

Pre-training Details

  • Base Model: MARBERTv2 (1B Arabic tweets)
  • Strategy: Whole-word masking (respects Arabic agglutinative morphology)
  • Steps: 22,000 (early-stopped via composite improvement score)
  • Masking: 15% probability
  • Optimizer: AdamW, lr=5e-5, weight decay 0.01
  • Hardware: NVIDIA L4 24GB

Training Data — Publicly Available Sources (with credit)

vioBERT-v3 was trained on a mix of openly available Arabic medical text. We did not author these corpora; we combined and pre-processed them for DAPT, with full credit to the original authors.

# Corpus Size Source Original Author/Curator
1 AHD Corpus (Arabic Healthcare Q&A from Altibbi) ~808K Q&A pairs Mendeley · Kaggle Abdo Ashraf
2 Shifaa Arabic Medical Consultations ~84K Q&A pairs HuggingFace Ahmed-Selem (data scraped from Islam Web)
3 Arabic Wikipedia (medical articles) ~5K articles Wikipedia API (medical categories) Wikipedia community (CC-BY-SA)
4 WHO Arabic health content + public-domain medical texts variable gather_medical_corpus.py WHO / public domain

After deduplication and quality filtering, the combined DAPT corpus is approximately 1.12M documents (~189 tokens average per document).

Why we did not republish a combined dataset: the source corpora belong to their original authors. We use them under their respective licenses for research; users wishing to replicate should download from the original sources cited above. Vionex's contribution is the model weights and the gathering/preprocessing methodology — not the data itself.

Evaluation Methodology

Evaluated across 5 orthogonal axes with 42 experimental configurations, 3 random seeds each (so reported deltas have real error bars):

  1. Intrinsic: Perplexity + fill-mask accuracy on a held-out medical corpus
  2. Linear probing: Frozen encoder classification
  3. Full fine-tuning: 5-class and 39-class medical text classification
  4. NER: Arabic medical named entity recognition
  5. MCQ: Medical question answering (5 difficulty levels)

The classification benchmark uses the deduplicated Shifaa Arabic Medical Consultations dataset (Ahmed-Selem on HF) — see Section 3 of our paper for benchmark methodology.

Limitations

  • vioBERT is a research tool, not a diagnostic or clinical-deployment system.
  • Clinical deployment requires bias auditing across dialects and demographics.
  • Performance on clinical reasoning tasks (e.g., complex MCQ) currently sits at statistical parity with the base model — DAPT transfers terminology and patterns, not reasoning.
  • Coverage skews toward MSA and Mashriq dialects; Maghrebi medical Arabic is underrepresented.

Ethics

All training data is publicly available — no patient records, EHR data, or PHI was used. The Islam Web consultations (in Ahmed-Selem's Shifaa corpus) are voluntarily public; the AHD Altibbi corpus was published with redaction. Vionex applied additional PII regex sweeps during corpus assembly.

Citation

@article{zaghloul2026viobert,
  title  = {From Tweets to Treatment: Domain-Adaptive Pre-Training for Arabic Medical {NLP}},
  author = {Zaghloul, Yousef and Khaled, Abdallah},
  year   = {2026},
  institution = {Vionex Digital Solutions},
  url    = {https://huggingface.co/Vionex-digital/vioBERT-v3}
}

If you use this model, please also cite the underlying corpora:

@dataset{ashraf2023ahd,
  title  = {Arabic Healthcare Q\&A Dataset (AHD)},
  author = {Ashraf, Abdo},
  year   = {2023},
  url    = {https://data.mendeley.com/datasets/mgj29ndgrk/5}
}
@dataset{selem2024shifaa,
  title  = {Shifaa Arabic Medical Consultations},
  author = {{Ahmed-Selem}},
  year   = {2024},
  url    = {https://huggingface.co/datasets/Ahmed-Selem/Shifaa_Arabic_Medical_Consultations}
}

Looking for collaborators

We're especially interested in:

  • Arabic NLP researchers — benchmark replications, downstream task evaluations
  • Clinicians and hospital IT teams — feedback on real clinical deployment paths
  • Dialect-specialized teams — work on Maghrebi, Khaleeji, and Egyptian medical adaptation

Reach out via Vionex Digital Solutions or open a discussion on this model page.


Developed by Vionex Digital Solutions — building open Arabic AI for the 422M Arabic-speaking world.

Downloads last month
75
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vionex-digital/vioBERT-v3

Finetuned
(39)
this model

Dataset used to train Vionex-digital/vioBERT-v3

Space using Vionex-digital/vioBERT-v3 1