RL adapter (LoRA, 1 generator) — CPT + RL

GRPO LoRA adapter (r=32, α=64) trained on top of the CPT (1 generator: GPT-4.1-mini) model, using the HealthBench-BR train split as reward.

Test-split accuracy

Benchmark Accuracy
HealthBench-BR 75.6%
PCDT-QA 65.4%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("hugo/protocolos-clinicos-br-cpt-1gen-14b", torch_dtype="auto", device_map="auto")
tok  = AutoTokenizer.from_pretrained("hugo/protocolos-clinicos-br-rl-1gen-14b")
model = PeftModel.from_pretrained(base, "hugo/protocolos-clinicos-br-rl-1gen-14b")

Intended use & limitations

Research model for studying domain adaptation of LLMs to Brazilian clinical guidelines. Not a certified medical device. Even at the best accuracy reported in the paper, residual errors may involve consequential details (dosages, contraindications). Use only under qualified professional supervision.

Citation

See the paper and code at the project repository:

Code & paper: https://github.com/hugoabonizio/clinical-protocols-br

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hugo/protocolos-clinicos-br-rl-1gen-14b

Adapter
(1)
this model

Collection including hugo/protocolos-clinicos-br-rl-1gen-14b